endtask
endtask

Reputation: 55

for loop speed or alternative solution?

I know Python isn't built for speed but I would like to improve the performance of the following code:

listA = [1,2]
listB = [1,2,3,4,5,6,7,8,9,10]

# pre-allocate for speed. Appending empty list is slower?
newList = ['NaN']*len(listB)

# Do I need a loop? Can I use something faster?
for n in xrange(len(listB)):
    if listB[n] % 2 == 1:
        newList[n] = listA[0]
    else:
        newList[n] = listA[1]

My issue is listB can get pretty large.

I have already pre-allocated memory for newList and used xrange. I believe these provide significant speed increases for large lists.

But do I even need a for loop at all since each loop is not dependent on the previous result. Does python have an array type?

Can I break up listB and run the operation in parallel similar to parfor in Matlab?

ADDITIONAL INFO: For my problem, as listA gets bigger, listB gets exponentially bigger.

For each item in listB there needs to be a lookup in listA. Then a calculation is performed (not necessary modulo) and the result appended to newList. Then I do a statistical analysis on newList (say take an average for simplicity). newList will always be the same length as listB.

Upvotes: 0

Views: 147

Answers (5)

Freek Wiekmeijer
Freek Wiekmeijer

Reputation: 4940

The purpose of xrange is not to gain speed; its purpose is to reduce memory usage. The difference between range(N) and xrange(N) is that the latter doesn't expand to a list of size N but to a small generator object.

A few tips:

  • If your list is big, look into numpy. Numpy has efficient algorithms for array handling and uses native code internally.

  • Modulo is slow (if listB[n] % 2 == 1:). Better use a bitwise operator (if ListB[n]&1) in this case.

  • The if statement can go out: newList[n] = listA[1-ListB[n]&1] for each value of n in range. Invert the order of listA to get git of the 1- and save another integer op.

Upvotes: 3

pzp
pzp

Reputation: 6597

First, replace the modulo operator, n % 2, with the bitwise and operator, n & 1. Next, instead of accessing listB by index, just iterate through its items directly using in. You can remove listA entirely. These small improvements should should speed things up slightly.

newList = ((n & 1) + 1 for n in listB)

The real advantage of this code though, is that it is a generator comprehension, not a list comprehension. Although this doesn't make it any faster, it does make it much more memory efficient. That being said, it also has some disadvantages; you cannot access the entire list, and once you access a value it is gone. If you only intend on iterating through newList or performing some calculation on each item of newList this will be fine. If not, then make newList a list comprehension:

newList = [(n & 1) + 1 for n in listB]

Best of luck!

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180411

Just loop over listB and set two variables at the start instead of repeatedly indexing:

newList = []


i, j = listA[0], listA[1]
for n in listB:
    if n % 2:
        newList.append(i)
    else:
        newList.append(j)

Or use a list comp:

 [i if n % 2 else j for n in listB]

Timings:

In [4]: %%timeit
newList = ['NaN']*len(listB)
for n in xrange(len(listB)):
    if listB[n] % 2 == 1:
        newList[n] = listA[0]
    else:
        newList[n] = listA[1]
   ...: 
100000 loops, best of 3: 2.33 µs per loop

In [5]: %%timeit
   ...: i,j = listA[0], listA[1]
   ...: [i if n % 2 else j for n in listB]
   ...: 
1000000 loops, best of 3: 1.12 µs per loop

In [16]: %%timeit
....: newList = []
....: i,j = listA[0], listA[1]
....: for  n in listB:
....:     if n % 2 == 1:
....:         newList.append(i)
....:     else:
....:         newList.append(j)
....: 
1000000 loops, best of 3: 1.88 µs per loop

In [18]: timeit [listA[1 - x%2] for x in listB]
1000000 loops, best of 3: 1.38 µs per loop

Using if n & 1 is slightly faster:

In [11]: %%timeit
i,j = listA[0], listA[1]
[i if n & 1 else j for n in listB]
   ....: 
1000000 loops, best of 3: 1.04 µs per loop

So indexing always adds more overhead whether in a list comp or a loop. It is pointless continually indexing listA when you just want the two values.

If you want more speed compiling with cython and simply typing a couple of variables cuts down the runtime:

In [31]: %%cython
   ....: def faster(l1,l2):
   ....:     cdef int i,j,n
   ....:     i, j = l1[0], l2[1]
   ....:     return [i if n & 1 else j for n in l2]
   ....: 

In [32]: 

In [32]: timeit faster(listA,listB)
1000000 loops, best of 3: 455 ns per loop

If you are doing a lot of numeric calculations you may want to look further into cython and or numpy.

Upvotes: 0

DeepSpace
DeepSpace

Reputation: 81604

Using list comprehension seems to cut some time:

listB = [i for i in xrange(1,1000000)]
start = clock()

listA = [1,2]

for n in xrange(len(listB)):
    if listB[n] % 2 == 1:
        newList[n] = listA[0]
    else:
        newList[n] = listB[1]

print "Time taken = %.5f" % (clock() - start)

>>> 0.21216

Compared to:

listB = [i for i in xrange(1,1000000)]
start = clock()

listA = [1,2]

newList = [listA[0] if i%2 == 1 else listA[1] for i in listB]

print "Time taken = %.5f" % (clock() - start)

>> 0.15658

Upvotes: 1

bereal
bereal

Reputation: 34281

The shortest and, perhaps, fastest way would be using list comprehension:

newList = [listA[1 - x%2] for x in listB]

Upvotes: 3

Related Questions