Reputation: 159
I'm kind of new to Python and I have to implement "fast as possible" version of this code.
s="<%dH" % (int(width*height),)
z=struct.unpack(s, contents)
heights = np.zeros((height,width))
for r in range(0,height):
for c in range(0,width):
elevation=z[((width)*r)+c]
if (elevation==65535 or elevation<0 or elevation>20000):
elevation=0.0
heights[r][c]=float(elevation)
I've read some of the python vectorization questions... but I don't think it applies to my case. Most of the questions are things like using np.sum
instead of for loops
. I guess I have two questions:
heights[r][c]=float(elevation)
is where the bottleneck is. I need to find some Python timing commands to confirm this. cython
, pypy
, weave
. I could do this faster in C but this code also need to generate plots so I'd like to stick with Python so I can use matplotlib
. Upvotes: 3
Views: 141
Reputation: 352959
As you mention, the key to writing fast code with numpy
involves vectorization, and pushing the work off to fast C-level routines instead of Python loops. The usual approach seems to improve things by a factor of ten or so relative to your original code:
def faster(elevation, height, width):
heights = np.array(elevation, dtype=float)
heights = heights.reshape((height, width))
heights[(heights < 0) | (heights > 20000)] = 0
return heights
>>> h,w = 100, 101; z = list(range(h*w))
>>> %timeit orig(z,h,w)
100 loops, best of 3: 9.71 ms per loop
>>> %timeit faster(z,h,w)
1000 loops, best of 3: 641 µs per loop
>>> np.allclose(orig(z,h,w), faster(z,h,w))
True
That ratio seems to hold even for longer z
:
>>> h,w = 1000, 10001; z = list(range(h*w))
>>> %timeit orig(z,h,w)
1 loops, best of 3: 9.44 s per loop
>>> %timeit faster(z,h,w)
1 loops, best of 3: 675 ms per loop
Upvotes: 6