Reputation: 31329
I'm trying to improve the speed of some python code a bit and therefore trying to move a standard for loop to either a list comprehension or map call:
buf = [0 for i in range(self.numLEDs * 3)]
temp = [0,0,0]
for x in range(self.numLEDs):
r = data[x*3]
g = data[x*3+1]
b = data[x*3+2]
temp[self.c_order[0]] = self.gamma[r]
temp[self.c_order[1]] = self.gamma[g]
temp[self.c_order[2]] = self.gamma[b]
buf[x * 3:x * 3 + 3] = temp
c_order is simply another list, in this case [1,2,0]. It controls the channel order for some RGB pixels. gamma is a list 256 elements long that holds gamma corrected values for each of the 8bit channel values.
What'd I'd like to do is somehow completely remove any use of a standard for loop from this bit of code. I've managed to do it without the channel swap, but with the gamma correction and it's twice as fast. Like this:
corrected = [gamma[i] for i in data]
buf[0:len(corrected)] = corrected
How can I swap the order of list elements as I go without a for loop though?
Upvotes: 1
Views: 282
Reputation: 97281
So you need pure python code without any extension library.
To speedup the code:
Here is the code:
class Test(object):
def __init__(self, n):
self.numLEDs = n
self.c_order = [1, 2, 0]
self.gamma = [i // 2 for i in range(256)]
def do1(self, data):
buf = [0 for i in range(self.numLEDs * 3)]
temp = [0,0,0]
for x in range(self.numLEDs):
r = data[x*3]
g = data[x*3+1]
b = data[x*3+2]
temp[self.c_order[0]] = self.gamma[r]
temp[self.c_order[1]] = self.gamma[g]
temp[self.c_order[2]] = self.gamma[b]
buf[x * 3:x * 3 + 3] = temp
return buf
def do2(self, data):
buf = [0] * (self.numLEDs * 3)
gamma = self.gamma
for idx, idx2 in enumerate(self.c_order):
buf[idx2::3] = [gamma[v] for v in data[idx::3]]
return buf
import random
random.seed(0)
N = 1000
t = Test(N)
data = [random.randint(0, 255) for i in range(3*N)]
r1 = t.do1(data)
r2 = t.do2(data)
print r1 == r2 # check the result
%timeit t.do1(data)
%timeit t.do2(data)
the output, it's 6x faster:
True
1000 loops, best of 3: 1.1 ms per loop
10000 loops, best of 3: 176 µs per loop
Upvotes: 1
Reputation: 54330
You can have everything done in numpy
in a few lines and slightly faster:
In [69]:
gamma=list(np.random.rand(256))
numLEDs=10
data=list(np.random.randint(0,256,30))
c_order=[0,1,2]
In [70]:
%%timeit
buf = [0 for i in range(numLEDs * 3)]
temp = [0,0,0]
for x in range(numLEDs):
r = data[x*3]
g = data[x*3+1]
b = data[x*3+2]
temp[c_order[0]] = gamma[r]
temp[c_order[1]] = gamma[g]
temp[c_order[2]] = gamma[b]
buf[x * 3:x * 3 + 3] = temp
10000 loops, best of 3: 47.3 µs per loop
In [85]:
gamma=np.array(gamma)
data=np.array(data)
In [86]:
%%timeit
data_array=data.reshape(3, -1, order='F')
np.take(gamma[data_array], c_order, axis=0).ravel(order='F')
10000 loops, best of 3: 38.3 µs per loop
When you have a lot of LED's, the numpy
version will be much faster than the loop
version:
In [98]:
gamma=list(np.random.rand(256))
numLEDs=1000
data=list(np.random.randint(0,256,3000))
c_order=[0,1,2]
In [99]:
%%timeit
buf = [0 for i in range(numLEDs * 3)]
temp = [0,0,0]
for x in range(numLEDs):
r = data[x*3]
g = data[x*3+1]
b = data[x*3+2]
temp[c_order[0]] = gamma[r]
temp[c_order[1]] = gamma[g]
temp[c_order[2]] = gamma[b]
buf[x * 3:x * 3 + 3] = temp
100 loops, best of 3: 4.08 ms per loop
In [100]:
gamma=np.array(gamma)
data=np.array(data)
In [101]:
%%timeit
data_array=data.reshape(3, -1, order='F')
np.take(gamma[data_array], c_order, axis=0).ravel(order='F')
1000 loops, best of 3: 244 µs per loop
Upvotes: 3
Reputation: 59601
Contrary to popular belief, calling a map
function will not give you significant speedup. You may actually see worse performance.
Depending on how long you spend in this section of code, this may be the perfect situation where simply porting this loop to C makes sense. See here.
Make sure that you're actually spending a lot of time in this for-loop, otherwise the overhead of calling your C code will outweigh any potential performance gains.
Read here for some potential alternatives if you decide to use to port this code to C:
Upvotes: 0