7.3. Filtering lists revisited

You're already familiar with using list comprehensions to filter lists. There is another way to accomplish this same thing, which some people feel is more expressive.

Python has a built-in filter function which takes two arguments, a function and a list, and returns a list.[14] The function passed as the first argument to filter must itself take one argument, and the list that filter returns will contain all the elements from the list passed to filter for which the function passed to filter returns true.

Got all that? It's not as difficult as it sounds.

Example 7.7. Introducting filter

>>> def odd(n):           1
...     return n%2
...     
>>> li = [1, 2, 3, 5, 9, 10, 256, -3]
>>> filter(odd, li)       2
[1, 3, 5, 9, -3]
>>> filteredList = []
>>> for n in li:          3
...     if odd(n):
...         filteredList.append(n)
...     
>>> filteredList
[1, 3, 5, 9, -3]
1 odd uses the built-in mod function “%” to return 1 if n is odd and 0 if n is even.
2 filter takes two arguments, a function (odd) and a list (li). It loops through the list and calls odd with each element. If odd returns a true value (remember, any non-zero value is true in Python), then the element is included in the returned list, otherwise it is filtered out. The result is a list of only the odd numbers from the original list, in the same order as they appeared in the original.
3 You could accomplish the same thing with a for loop. Depending on your programming background, this may seem more “straightforward”, but functions like filter are much more expressive. Not only is it easier to write, it's easier to read, too. Reading the for loop is like standing too close to a painting; you see all the details, but it may take a few seconds to be able to step back and see the bigger picture: “Oh, we're just filtering the list!”

Example 7.8. filter in regression.py

    files = os.listdir(path)                     1
    test = re.compile("test.py$", re.IGNORECASE) 2
    files = filter(test.search, files)           3
1 As we saw in Finding the path, path may contain the full or partial pathname of the directory of the currently running script, or it may contain an empty string if the script is being run from the current directory. Either way, files will end up with the names of the files in the same directory as this script we're running.
2 This is a compiled regular expression. As we saw in Refactoring, if you're going to use the same regular expression over and over, you should compile it for faster performance. The compiled object has a search method which takes a single argument, the string the search. If the regular expression matches the string, the search method returns a Match object containing information about the regular expression match; otherwise it returns None, the Python null value.
3 For each element in the files list, we're going to call the search method of the compiled regular expression object, test. If the regular expression matches, the method will return a Match object, which Python considers to be true, so the element will be included in the list returned by filter. If the regular expression does not match, the search method will return None, which Python considers to be false, so the element will not be included.

Historical note. Versions of Python prior to 2.0 did not have list comprehensions, so you couldn't filter using list comprehensions; the filter function was the only game in town. Even with the introduction of list comprehensions in 2.0, some people still prefer the old-style filter (and its companion function, map, which we'll see later in this chapter). Both techniques work, and neither is going away, so which one you use is a matter of style.

Example 7.9. Filtering using list comprehensions instead

    files = os.listdir(path)                    
    test = re.compile("test.py$", re.IGNORECASE)
files = [f for f in files if test.search(f)] 1
1 This will accomplish exactly the same result as using the filter function. Which way is more expressive? That's up to you.

Footnotes

[14] Technically, the second argument to filter can be any sequence, including lists, tuples, and custom classes that act like lists by defining the __getitem__ special method. If possible, filter will return the same datatype as you give it, so filtering a list returns a list, but filtering a tuple returns a tuple.