I’ve seen several posts now with people complaining about the following python code:
>>> funcs = []
>>> for i in range(11):
def func(x):
return x + i
funcs.append(func)
>>> funcs[1](5)
15
Most people first looking at the code would expect the value of funcs[1](5)
to be 6, which it is clearly not, but I’ve found many confusing and sometimes
just wrong explanations for why this is so. I wanted to clarify my
understanding of the issue and hopefully provide a simple and clear reasoning
for others. Hopefully this is not one of those wrong explanations, and if
someone who didn’t just learn about lexical scoping today wants to correct me
please do. I also may be playing a little fast and loose with some terminology,
so let me know if anything is confusing.
The surprise is the result of two non-obvious python features:
for
loops do not create their own isolated scope
This is clearly demonstrated with:
>>> for i in range(10):
j = 5
>>> j
5
>>> i
9
Where you can see that not only is the loop variable i
maintained after the
for loop, but j
as well. This seems like a somewhat questionable design
decision to me, but perhaps that’s because I’m primarily a C developer.
Expressions within the body of a function are evaluated when the function is called, not when it is defined
This is also clearly demonstrated with a simple bit of code:
>>> def func(x):
return x + i
>>> func(5)
Traceback (most recent call last):
File "", line 1, in
File "", line 2, in func
NameError: global name 'i' is not defined
>>> i = 3
>>> func(5)
8
>>> i = 4
>>> func(5)
9
So you can see, even though i
didn’t exist with the function was defined, the
interpreter happily allowed us to use it in the body of the function, and then
only attempted to evaluate it when it was actually called.
Putting these two things together it’s clear why python evaluates our first
code example the way it does. The variable i
inside the body of the function
has nothing to do with the loop variable i
until the moment that the function
in the array is actually called, at which point python looks up the name i
in
the symbol table and finds the one that happens to have ‘leaked out’ of the
for
loop.
Solutions to this issue usually involve forcing evaluation of the variable so that the function references the value instead of the name. My favorite version involves adding the variable as a default value to the function, forcing evaluation at definition time. e.g.
>>> funcs = []
>>> for i in range(11):
def func(x, inc=i):
return x + inc
funcs.append(func)
>>> funcs[1](5)
6
In this version i
is evaluated and the value stored in inc
when each copy
of func
is defined.
Now we’re cooking with Closures
OK, so things get a little more complicated when we need to specify where exactly the interpreter looks when it finds a variable in your defined function that’s not local to the function (a ‘free’ variable). As mentioned above, the interpreter looks up a value at the time of the call, but it looks in the scope within which the function was defined. More code!
>>> def inc(x):
return x + y
>>> def outer():
y = 3
print inc(4)
>>> outer()
Traceback (most recent call last):
File "", line 1, in
File "", line 3, in outer
File "", line 2, in inc
NameError: global name 'y' is not defined
>>> y = 1
>>> outer()
5
So here we see that when inc
is called within the outer
scope, the
interpreter looks up y
within the original scope where inc
was defined, not
the scope from which it was called. After we define y
where it is in reach of
inc
, all is well and happy. This is where closures come into play, because
what if the scope in which the function was defined no longer exists? In a
language without closure support we’d be out of luck, but with closures those
references stick around after the function that created the scope finishes executing.