Generator Expressions

We have a file filled with all the launch code for every missile in the world. As you can imagine, the file is huge – there are an enormous number of missiles in the world. So we want to go through this and fire a US-owned missile at Russia; this way, the Russians blame the Americans, start World War III, and somehow we end up ahead.

The lines are in this format:

Counter Code of Origin : Country Code of Destination : Secret Code

With our current knowledge of Python, we could easily do something like this:

with open('secret_codes.txt', 'r') as f:
    codes = f.readlines()
    world_war_3 = [c for c in codes
            if c.split(':')[0] == 'USA' and c.split(':')[1] == 'RUS']
    for line in world_war_3:
        print line.split(':')[-1]

Not bad – it’s some pretty advanced code there. We have a context manager to open the file, and we use list comprehension to very quickly go through the lines in the file and find the list of codes that we wanted. We then print the codes of the missiles alone so that we can enter them.

But there’s a problem in our super secret spy plan. The file is huge. Imagine that our computer is a laptop (as spies often use,) and this laptop has 8 GB of information. In reality, it probably has way less. That means that the line codes = f.readlines() forces Python to read the entire file all at once and store the entire thing into the variable codes.

We see the problem now. If the file is larger than 8GB, we’re out of luck! So what can we do? We obviously don’t need the entire file all at once. At most we’re only using one line at a time. So we need to read the file one line at a time. This way we are almost certainly sure to not go over the memory requirement.

with open('secret_codes.txt', 'r') as f:
    for line in f:
        origin, dest, code = line.split(':')
        if origin == 'USA' and dest == 'RUS':
            print code

Wow! That was awesome. How did Python know to automatically iterate through the file in a line by line manner? Enter yield.

Yield

An iterable is everything you can use the for... in construct on: lists, strings, files, etc are all iterable. It means you can iterate over them.

Yield is much like return, except that the function will return a generator object! An iterator is what allows you to iterate over something; it’s what defines how you iterate over something, and when you stop. A generator is a type of iterator.

def squares(n):
    for i in xrange(n):
        yield i * i

for i in squares(10):
    print i

As you expect, the output is

0
1
4
9
16
25
36
49
64
81

Here is an excellent resource to explain the yield statement if you are still confused.

As explained in the link posted above, “To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-). Then, your code will be run each time the for uses the generator.”

“The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it’ll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return.”

When does it stop? A generator is empty once the function does not hit yield anymore; i.e. once the function exits, the generator is considered empty.

Generator Expressions

Generator Expressions are nothing more than systematic ways to utilize yield statements. The principle benefit, once again, is to minimize memory usage. If you don’t need everything all at once, you should use a generator expression rather than list comprehension.

So when would you have to use list comprehension? If you need to concatenate the output with a list, you need to use list comprehension. If you need to index into the result, you need list comprehension.

But in most other cases, you can suffice with generator expressions.

Generators are iterators, but you can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly.

Syntax-wise, it’s exactly like list comprehensions but with parens around instead of brackets (( ) instead of [ ]).

Here’s a great example. Write a function where the parameter is one integer, and you return the sum of the squares from one up to that integer. So if I call sum_squares(3) you calculate 1 * 1 + 2 * 2 + 3 * 3 and return 14.

Like many things in Python, this can be done in a single line.

def sum_squares(n):
    return sum([x*x for x in range(n)])

But do we really need the entire list all at once? Of course not. So let’s swap out the list comprehension for a generator expression.

def sum_squares(n):
    return sum((x*x for x in range(n)))

That’s kind of ugly – the first set of parens is due to sum, and the second set is due to the generator expression. Because of this ugliness, Python allows generator expressions inside function calls to be without parens. So it can actually just look like this:

def sum_squares(n):
    return sum(x*x for x in range(n))

Pretty cool! As you look over the places where you use list comprehension, you’ll start to realize how often generator expressions could have been used instead!

Here’s another example. We have a file with many lines; find the length of the longest line in the file.

with open(filename, 'r') as file:
    print max(len(line) for line in file)

Other Comprehensions

There are actually two more types of comprehensions in Python: set comprehension and dictionary comprehension.

Set comprehensions create sets, and dictionary comprehensions create dictionaries.

Since these are only available in Python 3.0+, and this was advertised as a Python 2.7 course, we won’t be discussing them. Feel free to look them up if you’re curious – the syntax is pretty much exactly the same as the other comprehensions, only they use braces ({ }) instead of parens or brackets.

Further Resources

David Beazely is my personal Python God, and he has much wisdom to share with you. He is a master of coroutines, subroutines, parallelism in Python, and many many other things.

Here is his first lecture on generators - there are many more, and he is a repeat speaker at PyCon. Seriously, learn his stuff. He is incredible.