Inside Python's Range: How Losing an Argument Taught Me About Iterables

  |   Source   |   6 min read

A while ago, a friend and I got into a debate about Python's built-in range's type. I'll admit I never really looked into it. And why would I? It such a simple thing you learn at the start of you Python journey! I assumed, based on it's behavior and the lack of first letter uppercase, that it was a simple function you'd call with min, max, and step arguments to get back a list of numbers. Simple.

My friend firmly disagreed. He told me to look it up, and he had a weighty argument : Python's official documentation. There it is written, plain as day : class range([...]). Not a function. A class. An object. Oops.

So he was right, obviously. But instead of just conceding, I thought this was the perfect opportunity to dig deeper. Why redesign range as an object? What is the point of instanciating an object if you simply need to return a list?

Python2 : Where I Was (Kind Of) Right

Turns out my initial assumption wasn't wrong, just a bit outdated. In Python 2, range() was indeed a function returning a list, which you can actually check very easily:

print(type(range))
print(type(range(10)))
<type 'builtin_function_or_method'>
<type 'list'>

Want all the numbers from 0 to 4? Easy. range(5) creates a list for you and return it : [0, 1, 2, 3]. It just works... until you need something big. Try range(10**8) (that's 100 million numbers...). Python2 would calculate all 100,000,000 integers and store them in memory at once (which takes a bit more than 6GB on my machine !).

Python2 has a workaround, namely xrange(), which works differently. But for Python3, the developers streamlined things: they dropped the old range() function entirely and made lazy generation the new standard, thanks to the range object.

Python3: Where I Was (Really) Wrong

Indeed, in Python3, range is an object with its own type:

print(type(range(10)))
<class 'range'>

But it's not just any object: it's specifically an iterable. This means it implements __iter__(), letting you use it in for loops to generate numbers dynamically. So why this more complex structure? In two word : memory efficiency. As the docs state:

The advantage of the range type over a regular list [...] is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents.

Remember that range(10**8) disaster in Python2? Watch how Python3 handles it:

from sys import getsizeof
print(getsizeof(range(10**8)))
48

Yes. 48 bytes. The exact same memory footprint as range(10) or range(10**10). Under the hood, the range object only stores only a few variables, of which the most important are the parameters given to it, and then generates the right number for each loop iteration. No precomputed lists, no memory spike. Pretty neat, huh?

Under the Hood: CPython Implementation

The reference Python interpreter, CPython is written in C. If you want to see what's really happening behind the scenes, you can peek at its source code. But be warned, dear viewer, it's C and it's not an easy read. It's filled with OOP patterns shoehorned into in a language that does not support it natively, and it has to deal with manual memory allocation (because C doesn't do that for you!) and low level integer handling (int and long...). The developers also implemented optimizations to deal with very large ranges. So instead of a tedious in-depth tour, let's focus on a few key parts. First, the struct underlying the range object :

typedef struct {
    PyObject_HEAD
    PyObject *start;
    PyObject *stop;
    PyObject *step;
    PyObject *length;  // Precomputed length
} rangeobject;

Notice what's here, and more importantly what's not: just four variables. No giant array of numbers. Let's take a look at the __iter__() code (the code is simplified and the comments are mine).

static PyObject *range_iter(PyObject *seq)
{
    /* r is roughly equilvalent to self, the instance of the object we are working with */
    rangeobject *r = (rangeobject *)seq;

    /* Get attribute values from instance */
    long lstart, lstop, lstep;
    unsigned long ulen;
    lstart = PyLong_AsLong(r->start);
    lstop = PyLong_AsLong(r->stop);
    lstep = PyLong_AsLong(r->step);

    /* Calculate range's length */
    ulen = get_len_of_range(lstart, lstop, lstep);

    return fast_range_iter(lstart, lstop, lstep, (long)ulen);
}

fast_range_iter(long start, long stop, long step, long len)
{
    /* Allocates space for a new _PyRangeIterObject (roughly like instanciating the object)*/
    _PyRangeIterObject *it = _Py_FREELIST_POP(_PyRangeIterObject, range_iters);

    /* initializes 3 attributes in the newly created _PyRangeIterObject */
    it->start = start;
    it->step = step;
    it->len = len;

    /* Returns the object */
    return (PyObject *)it;
}

Surprise! For each call to __iter__() is called, a fresh iterator is created. This allows the Python programmer to reuse a range, for example:

x = range(10)
for i in x: print(x) # Works
for i in x: print(x) # Still works!

Now let's look at the implementation of the __next__() function of this new iterator.

static PyObject *rangeiter_next(PyObject *op)
{
    _PyRangeIterObject *r = (_PyRangeIterObject*)op;
    if (r->len > 0) {
        long result = r->start;
        r->start = result + r->step;
        r->len--;
        return PyLong_FromLong(result);
    }
    return NULL;
}

Each call calculates just the next number by adding start and step, start for the next iteration, and returns the current start. It also decrement len, which tracks when we reached stop.

Simplified Python Implementation

While CPython's implementation is complex, we can explore range's behavior in a simplified pure Python. In fact, range's algoritm is implemented in python in CPython's tests, but as a generator. Ours will be more faithful to the actual C implementation:

class SimpleRange:
    def __init__(self, start, stop=None, step=1):
        # Handle range(5) → range(0,5)
        if stop is None:
            self.start = 0
            self.stop = start
        else:
            self.start = start
            self.stop = stop
        self.step = step

        # Precompute length - like C's get_len_of_range()
        length = (self.stop - self.start) // step
        self.length = length

    def __iter__(self):
        # Return a new iterator for each call
        return SimpleRangeIterator(self.start, self.step, self.length)

    def __len__(self):
        return self.length

class SimpleRangeIterator:
    def __init__(self, start, step, length):
        self.current = start
        self.step = step
        self.remaining = length

    def __next__(self):
        if self.remaining <= 0:
            raise StopIteration

        value = self.current
        self.current += self.step
        self.remaining -= 1
        return value

Just like in the C implementation, our SimpleRange class precomputes the length when created. When __iter__() is called , it instantiates a fresh SimpleRangeIterator object. Notice how this iterator doesn't need the stop value at all: it determines when to end by decrementing the remaining counter with each __next__() call.

While this simplified version lacks many features of real range (notably negative step support), it gives a good intuition of the underlying concepts. Here's how Python would use it behind the scenes when you write a for loop:

x = SimpleRange(5)       # x is a SimpleRange instance
it = x.__iter__()        # Creates iterator (usually implicit in for loops)
while True:
    try:
        i = it.__next__()  # Gets next value
        # Do something with i
    except StopIteration:
        break

Yes. All of this in a simple:

for i in range(5):
    # ...

Isn't Python beautiful ?

Conclusion

In closing, thank you for joining me on this dive into range's internals, dear reader. Whether you've gained technical insights or was simply reminded how to transform mistakes into curiosity, I hope the journey was valuable. And let me end with this plea : next time you write for i in range(...):, pause a second to appreciate the developers who abstracted away complexity, freeing you to focus on your project!

And for those still wondering about the naming: range isn't capitalized because PEP 8 explicitly exempts built-in names from CamelCase. A reminder to always RTFM!