user1176501
user1176501

Reputation:

List comprehension optimization

I managed to convert 8 lines of code into just 2 lines.

The first list comprehension gets me the folder and the second gets me the files of specific filter:

hideTheseFolders=[".thumb",".mayaSwatches","RECYCLER","$AVG"]
fileFilters=["ma","jpg","png","mb",'iff','tga','tif']
newLst=[]
import os
locationTxt="E:\box\scripts"
[newLst.append(each) for each in os.listdir(locationTxt)  if os.path.isdir(os.path.join(locationTxt,each)) and each not in hideTheseFolders]
[newLst.append(os.path.basename(os.path.join(locationTxt,each))) for nfile in fileFilters for each in os.listdir(locationTxt) if each.endswith(nfile)]

Now in the above code the last two lines are looking inside same directory from locationTxt, which means there is probably a way that I can merge the last two lines. Any suggestions?

Upvotes: 5

Views: 2723

Answers (4)

Benjamin Hodgson
Benjamin Hodgson

Reputation: 44634

List comprehensions are not a technique for optimisation. When the Python compiler sees a list comprehension, it breaks it down into a for loop. Look at bytecode 13 (FOR_ITER):

In [1]: from dis import dis

In [2]: code = "[i for i in xrange(100)]"

In [3]: dis(compile(code, '', 'single'))
  1           0 BUILD_LIST               0
              3 LOAD_NAME                0 (xrange)
              6 LOAD_CONST               0 (100)
              9 CALL_FUNCTION            1
             12 GET_ITER            
        >>   13 FOR_ITER                12 (to 28)
             16 STORE_NAME               1 (i)
             19 LOAD_NAME                1 (i)
             22 LIST_APPEND              2
             25 JUMP_ABSOLUTE           13
        >>   28 POP_TOP             
             29 LOAD_CONST               1 (None)
             32 RETURN_VALUE      

The fact that a list comprehension is the same as a for loop can also be seen by timing it. In this case, the for loop actually worked out slightly (but insignificantly) faster:

In [4]: %timeit l = [i for i in xrange(100)]
100000 loops, best of 3: 13.6 us per loop

In [5]: %%timeit l = []; app = l.append  # optimise out the attribute lookup for a fairer test
   ...: for i in xrange(100):
   ...:     app(i)
   ...: 
100000 loops, best of 3: 11.9 us per loop  #  insignificant difference. Run it yourself and you might get it the other way around

You can therefore write any given list comprehension as a for loop with a minimal performance hit (in practice there is usually a small difference due to attribute lookup), and often a significant readability benefit. In particular, loops which have side effects should not be written as list comprehensions. Nor should you use list comprehensions that have more than about two for keywords, or which make a line longer than 70 characters or so. These aren't hard-and-fast rules, just heuristics for writing readable code.

Don't get me wrong, list comprehensions are very useful, and can often be clearer, simpler and more concise than an equivalent for-loop-and-append. But they are not to be abused in this way.

Upvotes: 4

user1176501
user1176501

Reputation:

I would stick to a code that is more readable and avoid list comprehension or will keep a backup reference to a readable if I have to do list comprehension.

So far my learning for doing list comprehension I would put that so that everyone can follow along.

The primary uses for comprehension are:

  • grabbing the result of an iterator (possibly with a filter) into a permanent list: files = [f for f in list_files() if f.endswth("mb")]
  • converting between iterable types: example = "abcde"; letters = [x for x in example] # this is handy for data packed into strings!
  • simple list processing: strings = [str(x) for x in list_of_numbers]
  • more complex list processing with lambdas for readability: filter_func = lambda p, q: p > q larger_than_last = [val for val in list_of_numbers if filter_func(val, 5)]

Thank you everyone for your input and nailing.

An update: My research and troubleshooting has got me the exact answer.

filters = [[".thumb", ".mayaSwatches", "RECYCLER", "$AVG"], ["ma", "jpg", "png", "mb", 'iff', 'tga', 'tif']]
locationTxt = r"E:\box\scripts"
newLst = [each for each in os.listdir(locationTxt) if os.path.isdir(os.path.join(locationTxt, each)) and each not in filters[0]] + [each for each in os.listdir(locationTxt) if os.path.isfile(os.path.join(locationTxt, each)) and os.path.splitext(each)[-1][1:] in filters[1]]

however as I mentioned sticking to a readable code logic is the way to go!!!

Upvotes: 0

Fred Foo
Fred Foo

Reputation: 363567

First off, you're abusing list comprehensions to hide loops by appending inside them; you're actually throwing away the result of the list comprehension. Second, there's no need to cram as much as possible into a single line at the expense of readability.

If you want to use list comprehensions, which is actually a pretty good idea when building lists by looping and filtering, then consider this version:

ignore_dirs = set([".thumb",".mayaSwatches","RECYCLER","$AVG"])
extensions = ["ma", "jpg", "png", "mb", 'iff', 'tga', 'tif']
location = "E:\\box\\scripts"

filelist = [fname for fname in os.listdir(location)
                  if fname not in ignore_dirs
                  if os.path.isdir(os.path.join(location, fname))]
filelist += [os.path.basename(fname)
             for fname in os.listdir(location)
             if any(fname.endswith(ext) for ext in extensions)]

Note that there are still two comprehensions, because you seem to be building a list that logically consists of two kinds of items. There's no need to try and do that in a single expression, although you could have used two comprehensions with a + in between them instead of the += statement.

(I took the liberty of renaming the variables to reflect what they represent.)

Upvotes: 4

lenik
lenik

Reputation: 23518

The main suggestion is to get a decent Python book and read it well. Judging from your code you have no idea how list comprehensions work, still you managed to cram 8 readable lines of code into 2 overly long and incomprehensible ones.

You should write programs that are easy to read:

  • newlines are your friends, use them
  • spaces are your friends too
  • lines should fit on the screen (<50 characters)
  • put imports in the beginning of the file
  • read a python book

Just in case you're wondering, here's what your code should look like:

import os

path = 'e:/box/scripts'

newLst = list()
for root,dirs,files in os.walk(path) :
    # add folders
    newLst.extend( [dir for dir in dirs if dir not in hideTheseFolders] )

    # add files
    newLst.extend( [file for file in files if file.lower().endswith(fileFilters)] )

    break    # don't descend into subfolders

# convert to the full path or whatever you need here
newLst = [os.path.join(path, file) for file in newLst]

Upvotes: 0

Related Questions