Reputation: 55
I know Python isn't built for speed but I would like to improve the performance of the following code:
listA = [1,2]
listB = [1,2,3,4,5,6,7,8,9,10]
# pre-allocate for speed. Appending empty list is slower?
newList = ['NaN']*len(listB)
# Do I need a loop? Can I use something faster?
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
My issue is listB
can get pretty large.
I have already pre-allocated memory for newList
and used xrange
. I believe these provide significant speed increases for large lists.
But do I even need a for loop at all since each loop is not dependent on the previous result. Does python have an array type?
Can I break up listB
and run the operation in parallel similar to parfor
in Matlab?
ADDITIONAL INFO: For my problem, as listA gets bigger, listB gets exponentially bigger.
For each item in listB there needs to be a lookup in listA. Then a calculation is performed (not necessary modulo) and the result appended to newList. Then I do a statistical analysis on newList (say take an average for simplicity). newList will always be the same length as listB.
Upvotes: 0
Views: 147
Reputation: 4940
The purpose of xrange
is not to gain speed; its purpose is to reduce memory usage. The difference between range(N)
and xrange(N)
is that the latter doesn't expand to a list of size N but to a small generator object.
A few tips:
If your list is big, look into numpy. Numpy has efficient algorithms for array handling and uses native code internally.
Modulo is slow (if listB[n] % 2 == 1:
). Better use a bitwise operator (if ListB[n]&1
) in this case.
The if
statement can go out: newList[n] = listA[1-ListB[n]&1]
for each value of n in range. Invert the order of listA
to get git of the 1-
and save another integer op.
Upvotes: 3
Reputation: 6597
First, replace the modulo operator, n % 2
, with the bitwise and operator, n & 1
. Next, instead of accessing listB
by index, just iterate through its items directly using in
. You can remove listA entirely. These small improvements should should speed things up slightly.
newList = ((n & 1) + 1 for n in listB)
The real advantage of this code though, is that it is a generator comprehension, not a list comprehension. Although this doesn't make it any faster, it does make it much more memory efficient. That being said, it also has some disadvantages; you cannot access the entire list, and once you access a value it is gone. If you only intend on iterating through newList
or performing some calculation on each item of newList this will be fine. If not, then make newList
a list comprehension:
newList = [(n & 1) + 1 for n in listB]
Best of luck!
Upvotes: 1
Reputation: 180411
Just loop over listB and set two variables at the start instead of repeatedly indexing:
newList = []
i, j = listA[0], listA[1]
for n in listB:
if n % 2:
newList.append(i)
else:
newList.append(j)
Or use a list comp:
[i if n % 2 else j for n in listB]
Timings:
In [4]: %%timeit
newList = ['NaN']*len(listB)
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listA[1]
...:
100000 loops, best of 3: 2.33 µs per loop
In [5]: %%timeit
...: i,j = listA[0], listA[1]
...: [i if n % 2 else j for n in listB]
...:
1000000 loops, best of 3: 1.12 µs per loop
In [16]: %%timeit
....: newList = []
....: i,j = listA[0], listA[1]
....: for n in listB:
....: if n % 2 == 1:
....: newList.append(i)
....: else:
....: newList.append(j)
....:
1000000 loops, best of 3: 1.88 µs per loop
In [18]: timeit [listA[1 - x%2] for x in listB]
1000000 loops, best of 3: 1.38 µs per loop
Using if n & 1 is slightly faster:
In [11]: %%timeit
i,j = listA[0], listA[1]
[i if n & 1 else j for n in listB]
....:
1000000 loops, best of 3: 1.04 µs per loop
So indexing always adds more overhead whether in a list comp or a loop. It is pointless continually indexing listA when you just want the two values.
If you want more speed compiling with cython and simply typing a couple of variables cuts down the runtime:
In [31]: %%cython
....: def faster(l1,l2):
....: cdef int i,j,n
....: i, j = l1[0], l2[1]
....: return [i if n & 1 else j for n in l2]
....:
In [32]:
In [32]: timeit faster(listA,listB)
1000000 loops, best of 3: 455 ns per loop
If you are doing a lot of numeric calculations you may want to look further into cython and or numpy.
Upvotes: 0
Reputation: 81604
Using list comprehension seems to cut some time:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
for n in xrange(len(listB)):
if listB[n] % 2 == 1:
newList[n] = listA[0]
else:
newList[n] = listB[1]
print "Time taken = %.5f" % (clock() - start)
>>> 0.21216
Compared to:
listB = [i for i in xrange(1,1000000)]
start = clock()
listA = [1,2]
newList = [listA[0] if i%2 == 1 else listA[1] for i in listB]
print "Time taken = %.5f" % (clock() - start)
>> 0.15658
Upvotes: 1
Reputation: 34281
The shortest and, perhaps, fastest way would be using list comprehension:
newList = [listA[1 - x%2] for x in listB]
Upvotes: 3