Index ¦ Archives ¦ Atom

The Closure Problem in Python

I’ve seen several posts now with people complaining about the following python code:

>>> funcs = []
>>> for i in range(11):
        def func(x):
            return x + i
        funcs.append(func)

>>> funcs[1](5)
15

Most people first looking at the code would expect the value of funcs[1](5) to be 6, which it is clearly not, but I’ve found many confusing and sometimes just wrong explanations for why this is so. I wanted to clarify my understanding of the issue and hopefully provide a simple and clear reasoning for others. Hopefully this is not one of those wrong explanations, and if someone who didn’t just learn about lexical scoping today wants to correct me please do. I also may be playing a little fast and loose with some terminology, so let me know if anything is confusing.

The surprise is the result of two non-obvious python features:

for loops do not create their own isolated scope

This is clearly demonstrated with:

>>> for i in range(10):
        j = 5

>>> j
5
>>> i
9

Where you can see that not only is the loop variable i maintained after the for loop, but j as well. This seems like a somewhat questionable design decision to me, but perhaps that’s because I’m primarily a C developer.

Expressions within the body of a function are evaluated when the function is called, not when it is defined

This is also clearly demonstrated with a simple bit of code:

>>> def func(x):
        return x + i

>>> func(5)
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in func
NameError: global name 'i' is not defined
>>> i = 3
>>> func(5)
8
>>> i = 4
>>> func(5)
9

So you can see, even though i didn’t exist with the function was defined, the interpreter happily allowed us to use it in the body of the function, and then only attempted to evaluate it when it was actually called.

Putting these two things together it’s clear why python evaluates our first code example the way it does. The variable i inside the body of the function has nothing to do with the loop variable i until the moment that the function in the array is actually called, at which point python looks up the name i in the symbol table and finds the one that happens to have ‘leaked out’ of the for loop.

Solutions to this issue usually involve forcing evaluation of the variable so that the function references the value instead of the name. My favorite version involves adding the variable as a default value to the function, forcing evaluation at definition time. e.g.

>>> funcs = []
>>> for i in range(11):
        def func(x, inc=i):
            return x + inc
        funcs.append(func)

>>> funcs[1](5)
6

In this version i is evaluated and the value stored in inc when each copy of func is defined.

Now we’re cooking with Closures

OK, so things get a little more complicated when we need to specify where exactly the interpreter looks when it finds a variable in your defined function that’s not local to the function (a ‘free’ variable). As mentioned above, the interpreter looks up a value at the time of the call, but it looks in the scope within which the function was defined. More code!

>>> def inc(x):
        return x + y

>>> def outer():
        y = 3
        print inc(4)

>>> outer()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 3, in outer
  File "", line 2, in inc
NameError: global name 'y' is not defined
>>> y = 1
>>> outer()
5

So here we see that when inc is called within the outer scope, the interpreter looks up y within the original scope where inc was defined, not the scope from which it was called. After we define y where it is in reach of inc, all is well and happy. This is where closures come into play, because what if the scope in which the function was defined no longer exists? In a language without closure support we’d be out of luck, but with closures those references stick around after the function that created the scope finishes executing.

© Spencer Russell. Built using Pelican. Theme by Giulio Fidente on github. .