A while ago, a friend and I got into a debate about Python's built-in range's type. I'll admit I never really looked into it. And why would I? It such a simple thing you learn at the start of your Python journey! I assumed, based on it's behavior and the lack of first letter uppercase, that it was a simple function you'd call with min, max, and step arguments to get back a list of numbers. Simple.
My friend firmly disagreed. He told me to look it up, and he had a weighty argument : Python's official documentation. There it is written, plain as day : class range([...])
. Not a function. A class. An object. Oops.
So he was right, obviously. But while I gracefully conceded, I got curious. Why redesign range as an object? What is the point of instantiating an object if you simply need to return a list?
1. Poking around in Python
Python2 : Where I Was (Kind Of) Right
Turns out my initial assumption wasn't wrong, just a bit outdated. In Python 2, range()
was indeed a function returning a list, which you can actually check very easily:
print(type(range)) print(type(range(10)))
<type 'builtin_function_or_method'> <type 'list'>
Want all the numbers from 0 to 4? Easy. range(5)
creates a list for you and return it : [0, 1, 2, 3]
. It just works... until you need something big. Try range(10**8)
(that's 100 million numbers...). Python2 would calculate all 100,000,000 integers and store them in memory at once (which takes a bit more than 6GB on my machine !).
Python2 has a workaround, namely xrange()
, which works differently. But for Python3, the developers streamlined things: they dropped the old range()
function entirely and made lazy generation the new standard, thanks to the range object.
Python3: Where I Was (Really) Wrong
In Python3, range
is an object with its own type, which is really easy to check:
print(type(range(10)))
<class 'range'>
Let's probe it a bit further.
print(help(range(10)))
class range(object) | range(stop) -> range object | range(start, stop[, step]) -> range object | | Return an object that produces a sequence of integers from start (inclusive) | to stop (exclusive) by step. [...] | | Methods defined here: | | __bool__(self, /) | True if self else False | | __contains__(self, key, /) | Return bool(key in self). | | __iter__(self, /) | Implement iter(self). | | __len__(self, /) | Return len(self). | | __repr__(self, /) | Return repr(self). | | __reversed__(self, /) | Return a reverse iterator. [...]
Notice the __iter__()
method? That’s what makes the range
object an iterable. Just like with a list or a string, Python (or you) can call this method to get an iterator: a special helper object designed to help you loop over the iterable that it was created from. The way iterators work in Pyhon is by implementing a __next__()
method, which returns the next element each time it’s called.
Still a bit fuzzy? It's quite uncommon to have to explicitly instantiate an iterator, but let’s do it for the sake of learning.
iter = range(10).__iter__() print(help(iter))
class range_iterator(object) | Methods defined here: | | __iter__(self, /) | Implement iter(self). | | __length_hint__(self, /) | Private method returning an estimate of len(list(it)). | | __next__(self, /) | Implement next(self). [...]
And now, to use it:
print(iter.__next__())
0
print(iter.__next__())
1
So why all this complexity? Two main reasons: ease of use and memory efficiency.
First, ease of use. Each time you call __iter__()
on an iterable object, Python returns a new iterator. This means the same iterable can be reused multiple times without being “consumed.” For example, in range
's case:
x = range(10) for i in x: print(x) # Works for i in x: print(x) # Still works!
Secondly, memory efficiency. As the docs points out:
The advantage of the range type over a regular list [...] is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents.
Remember that range(10**8)
disaster in Python2? Watch how Python3 handles it:
from sys import getsizeof print(getsizeof(range(10**8)))
48
Yep! Just 48 bytes. The same memory footprint as range(10)
or range(10**10)
. Under the hood, a range
object stores only a handful of values and computes each item on the fly as you iterate. No precomputed lists, no crazy memory allocation. Pretty neat, right?
2. Under the Hood: CPython Implementation
The reference Python interpreter is CPython. If you're curious about what’s really going on under the hood, you can take a peek at its source code.. But fair warning: it’s written in C, and it’s not exactly light reading. You'll find object-oriented patterns awkwardly wedged into a language that doesn’t natively support them, along with manual memory management (yep, no garbage collector here!) and low-level integer types handling (int
and long
). On top of that, the developers included several optimizations to handle large ranges efficiently.
Rather than going into every detail, let’s focus on a few key parts. First up: the struct
that defines the internal representation of a range object.
typedef struct { PyObject_HEAD PyObject *start; PyObject *stop; PyObject *step; PyObject *length; // Precomputed length } rangeobject;
These are what you’d think of as the attributes of a range object in OOP terms. Take note of what’s here, and more importantly, what’s not: just a few variables, no massive array of precomputed values needed.
Now let’s take a look at the __iter__()
method implementation. (I've simplified the code and added my own comments.)
static PyObject *range_iter(PyObject *seq) { /* r is roughly equilvalent to self, the instance of the object we are working with */ rangeobject *r = (rangeobject *)seq; /* Get attribute values from instance */ long lstart, lstop, lstep; unsigned long ulen; lstart = PyLong_AsLong(r->start); lstop = PyLong_AsLong(r->stop); lstep = PyLong_AsLong(r->step); /* Calculate range's length */ ulen = get_len_of_range(lstart, lstop, lstep); return fast_range_iter(lstart, lstop, lstep, (long)ulen); } fast_range_iter(long start, long stop, long step, long len) { /* Allocates space for a new _PyRangeIterObject (roughly like instanciating the object)*/ _PyRangeIterObject *it = _Py_FREELIST_POP(_PyRangeIterObject, range_iters); * Initialize attributes in the new iterator*/ it->start = start; it->step = step; it->len = len; /* Returns the new iterator*/ return (PyObject *)it; }
Don't be surprised to see a call to a function that’s defined after it's used. That’s perfectly legal in C, as long as the function's prototype has been declared before (usually in a header file). As we discovered earlier when poking around in Python, every call to __iter__()
creates a fresh iterator. It is given the start and step values, but, interestingly, not the stop. Instead, the total number of steps (i.e., the length) is computed ahead of time and passed into the iterator's constructor.
Now let's look at the implementation of the __next__()
function for this iterator.
static PyObject *rangeiter_next(PyObject *op) { _PyRangeIterObject *r = (_PyRangeIterObject*)op; if (r->len > 0) { long result = r->start; r->start = result + r->step; r->len--; return PyLong_FromLong(result); } return NULL; }
Each call simply computes the next number by adding start
and step
, then updates start
in place for the next iteration. The len
field is decremented to keep track of how many values are left. Once it hits zero, the function returns NULL, signaling that the iteration is complete. And if you think about it, this approach makes a lot of sense! Relying on the stop value alone would require checking different conditions depending on whether the range is increasing or decreasing. With a precomputed counter, the check becomes much simpler — likely just a single Jump If Zero (JZ) instruction at the assembly level.
3. Simplified Python Implementation
While CPython’s implementation is complex, we can hopefully get a better understanding of range's behavior by building a simplified, pure Python version. Interestingly, CPython’s own test suite includes an implementation of range, written in Python but as a generator. Here, we’ll try to create something that more closely mirrors underlying C.
class SimpleRange: def __init__(self, start, stop=None, step=1): # Handle range(5) -> range(0,5) if stop is None: self.start = 0 self.stop = start else: self.start = start self.stop = stop self.step = step # Precompute number of steps - like C's get_len_of_range() length = (self.stop - self.start) // step self.length = length def __iter__(self): # Return a new iterator for each call return SimpleRangeIterator(self.start, self.step, self.length) def __len__(self): return self.length class SimpleRangeIterator: def __init__(self, start, step, length): self.current = start self.step = step self.remaining = length def __next__(self): if self.remaining <= 0: raise StopIteration value = self.current self.current += self.step self.remaining -= 1 return value
Just like in the C implementation, our SimpleRange
class precomputes the length when it's created. When __iter__()
is called , it instantiates a fresh SimpleRangeIterator
instance. Notice how this iterator doesn't need the stop value at all: it simply decrements an internal counter on each call to __next__()
, and raise a StopIteration
exception when that counter hits zero.
Note that our simplified version lacks many of the features of the real range
(notably support for negative steps), but tying it all together, it helps to build much deeper understanding of what's really happening when the interpreter encounters a for loop:
rng = range(5) # range instance rng_it = rng.__iter__() # range_iterator instance while True: try: i = rng_it.__next__() # get next value # Do something with i except StopIteration: # rgn_it.len == 0, it's the end of the range break
Yes. All of this in a simple:
for i in range(5): # Do something with i
Isn't Python beautiful ?
Conclusion
In closing, I'd like to thank you for joining me on this dive into range's internals, dear reader. Whether you've gained technical insights or simply been reminded how you can transform mistakes into curiosity, I hope this journey was worthwhile. And before you write your next for i in range(...):
, please take a moment to appreciate the developers who carefully abstracted away all this complexity to let you focus on what matters to you.
And for those still wondering about the naming: range isn't capitalized because PEP 8 explicitly exempts built-in names from CamelCase. A reminder to always RTFM!