Inside Python's Range: How Losing an Argument Taught Me About Iterables

  |   Source   |   8 min read

A while ago, a friend and I got into a debate about Python's built-in range's type. I'll admit I never really looked into it. And why would I? It such a simple thing you learn at the start of your Python journey! I assumed, based on it's behavior and the lack of first letter uppercase, that it was a simple function you'd call with min, max, and step arguments to get back a list of numbers. Simple.

My friend firmly disagreed. He told me to look it up, and he had a weighty argument : Python's official documentation. There it is written, plain as day : class range([...]). Not a function. A class. An object. Oops.

So he was right, obviously. But while I gracefully conceded, I got curious. Why redesign range as an object? What is the point of instantiating an object if you simply need to return a list?

1. Poking around in Python

Python2 : Where I Was (Kind Of) Right

Turns out my initial assumption wasn't wrong, just a bit outdated. In Python 2, range() was indeed a function returning a list, which you can actually check very easily:

print(type(range))
print(type(range(10)))
<type 'builtin_function_or_method'>
<type 'list'>

Want all the numbers from 0 to 4? Easy. range(5) creates a list for you and return it : [0, 1, 2, 3]. It just works... until you need something big. Try range(10**8) (that's 100 million numbers...). Python2 would calculate all 100,000,000 integers and store them in memory at once (which takes a bit more than 6GB on my machine !).

Python2 has a workaround, namely xrange(), which works differently. But for Python3, the developers streamlined things: they dropped the old range() function entirely and made lazy generation the new standard, thanks to the range object.

Python3: Where I Was (Really) Wrong

In Python3, range is an object with its own type, which is really easy to check:

print(type(range(10)))
<class 'range'>

Let's probe it a bit further.

print(help(range(10)))
class range(object)
 |  range(stop) -> range object
 |  range(start, stop[, step]) -> range object
 |
 |  Return an object that produces a sequence of integers from start (inclusive)
 |  to stop (exclusive) by step. [...]
 |
 |  Methods defined here:
 |
 |  __bool__(self, /)
 |      True if self else False
 |
 |  __contains__(self, key, /)
 |      Return bool(key in self).
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __len__(self, /)
 |      Return len(self).
 |
 |  __repr__(self, /)
 |      Return repr(self).
 |
 |  __reversed__(self, /)
 |      Return a reverse iterator. 
[...]

Notice the __iter__() method? That’s what makes the range object an iterable. Just like with a list or a string, Python (or you) can call this method to get an iterator: a special helper object designed to help you loop over the iterable that it was created from. The way iterators work in Pyhon is by implementing a __next__() method, which returns the next element each time it’s called.

Still a bit fuzzy? It's quite uncommon to have to explicitly instantiate an iterator, but let’s do it for the sake of learning.

iter = range(10).__iter__()
print(help(iter))
class range_iterator(object)
 |  Methods defined here:
 |
 |  __iter__(self, /)
 |      Implement iter(self).
 |
 |  __length_hint__(self, /)
 |      Private method returning an estimate of len(list(it)).
 |
 |  __next__(self, /)
 |      Implement next(self).
[...]

And now, to use it:

print(iter.__next__())
0
print(iter.__next__())
1

So why all this complexity? Two main reasons: ease of use and memory efficiency.

First, ease of use. Each time you call __iter__() on an iterable object, Python returns a new iterator. This means the same iterable can be reused multiple times without being “consumed.” For example, in range's case:

x = range(10)
for i in x: print(x) # Works
for i in x: print(x) # Still works!

Secondly, memory efficiency. As the docs points out:

The advantage of the range type over a regular list [...] is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents.

Remember that range(10**8) disaster in Python2? Watch how Python3 handles it:

from sys import getsizeof
print(getsizeof(range(10**8)))
48

Yep! Just 48 bytes. The same memory footprint as range(10) or range(10**10). Under the hood, a range object stores only a handful of values and computes each item on the fly as you iterate. No precomputed lists, no crazy memory allocation. Pretty neat, right?

2. Under the Hood: CPython Implementation

The reference Python interpreter is CPython. If you're curious about what’s really going on under the hood, you can take a peek at its source code.. But fair warning: it’s written in C, and it’s not exactly light reading. You'll find object-oriented patterns awkwardly wedged into a language that doesn’t natively support them, along with manual memory management (yep, no garbage collector here!) and low-level integer types handling (int and long). On top of that, the developers included several optimizations to handle large ranges efficiently.

Rather than going into every detail, let’s focus on a few key parts. First up: the struct that defines the internal representation of a range object.

typedef struct {
    PyObject_HEAD
    PyObject *start;
    PyObject *stop;
    PyObject *step;
    PyObject *length;  // Precomputed length
} rangeobject;

These are what you’d think of as the attributes of a range object in OOP terms. Take note of what’s here, and more importantly, what’s not: just a few variables, no massive array of precomputed values needed.

Now let’s take a look at the __iter__() method implementation. (I've simplified the code and added my own comments.)

static PyObject *range_iter(PyObject *seq)
{
    /* r is roughly equilvalent to self, the instance of the object we are working with */
    rangeobject *r = (rangeobject *)seq;

    /* Get attribute values from instance */
    long lstart, lstop, lstep;
    unsigned long ulen;
    lstart = PyLong_AsLong(r->start);
    lstop = PyLong_AsLong(r->stop);
    lstep = PyLong_AsLong(r->step);

    /* Calculate range's length */
    ulen = get_len_of_range(lstart, lstop, lstep);

    return fast_range_iter(lstart, lstop, lstep, (long)ulen);
}

fast_range_iter(long start, long stop, long step, long len)
{
    /* Allocates space for a new _PyRangeIterObject (roughly like instanciating the object)*/
    _PyRangeIterObject *it = _Py_FREELIST_POP(_PyRangeIterObject, range_iters);

    * Initialize attributes in the new iterator*/
    it->start = start;
    it->step = step;
    it->len = len;

    /* Returns the new iterator*/
    return (PyObject *)it;
}

Don't be surprised to see a call to a function that’s defined after it's used. That’s perfectly legal in C, as long as the function's prototype has been declared before (usually in a header file). As we discovered earlier when poking around in Python, every call to __iter__() creates a fresh iterator. It is given the start and step values, but, interestingly, not the stop. Instead, the total number of steps (i.e., the length) is computed ahead of time and passed into the iterator's constructor.

Now let's look at the implementation of the __next__() function for this iterator.

static PyObject *rangeiter_next(PyObject *op)
{
    _PyRangeIterObject *r = (_PyRangeIterObject*)op;
    if (r->len > 0) {
        long result = r->start;
        r->start = result + r->step;
        r->len--;
        return PyLong_FromLong(result);
    }
    return NULL;
}

Each call simply computes the next number by adding start and step, then updates start in place for the next iteration. The len field is decremented to keep track of how many values are left. Once it hits zero, the function returns NULL, signaling that the iteration is complete. And if you think about it, this approach makes a lot of sense! Relying on the stop value alone would require checking different conditions depending on whether the range is increasing or decreasing. With a precomputed counter, the check becomes much simpler — likely just a single Jump If Zero (JZ) instruction at the assembly level.

3. Simplified Python Implementation

While CPython’s implementation is complex, we can hopefully get a better understanding of range's behavior by building a simplified, pure Python version. Interestingly, CPython’s own test suite includes an implementation of range, written in Python but as a generator. Here, we’ll try to create something that more closely mirrors underlying C.

class SimpleRange:
    def __init__(self, start, stop=None, step=1):
        # Handle range(5) -> range(0,5)
        if stop is None:
            self.start = 0
            self.stop = start
        else:
            self.start = start
            self.stop = stop
        self.step = step

        # Precompute number of steps - like C's get_len_of_range()
        length = (self.stop - self.start) // step
        self.length = length

    def __iter__(self):
        # Return a new iterator for each call
        return SimpleRangeIterator(self.start, self.step, self.length)

    def __len__(self):
        return self.length

class SimpleRangeIterator:
    def __init__(self, start, step, length):
        self.current = start
        self.step = step
        self.remaining = length

    def __next__(self):
        if self.remaining <= 0:
            raise StopIteration

        value = self.current
        self.current += self.step
        self.remaining -= 1
        return value

Just like in the C implementation, our SimpleRange class precomputes the length when it's created. When __iter__() is called , it instantiates a fresh SimpleRangeIterator instance. Notice how this iterator doesn't need the stop value at all: it simply decrements an internal counter on each call to __next__(), and raise a StopIteration exception when that counter hits zero.

Note that our simplified version lacks many of the features of the real range (notably support for negative steps), but tying it all together, it helps to build much deeper understanding of what's really happening when the interpreter encounters a for loop:

rng = range(5)       # range instance
rng_it = rng.__iter__()        # range_iterator instance
while True:
    try:
        i = rng_it.__next__()  # get next value
        # Do something with i
    except StopIteration: # rgn_it.len == 0, it's the end of the range
        break

Yes. All of this in a simple:

for i in range(5):
    # Do something with i

Isn't Python beautiful ?

Conclusion

In closing, I'd like to thank you for joining me on this dive into range's internals, dear reader. Whether you've gained technical insights or simply been reminded how you can transform mistakes into curiosity, I hope this journey was worthwhile. And before you write your next for i in range(...):, please take a moment to appreciate the developers who carefully abstracted away all this complexity to let you focus on what matters to you.

And for those still wondering about the naming: range isn't capitalized because PEP 8 explicitly exempts built-in names from CamelCase. A reminder to always RTFM!