Reputation: 905
I need to store some data structure like that:
{'x1,y1,z1': [[p11_x,p11_y,p11_z], [p12_x,p12_y,p12_z], ..., [p1n_x,p1n_y,p1n_z]],
'x2,y2,z2': [[p21_x,p21_y,p21_z], [p22_x,p22_y,p22_z], ..., [p2n_x,p2n_y,p2n_z]],
...
'xn,yn,zn': [[pn1_x,pn1_y,pn1_z], [pn2_x,pn2_y,pn2_z], ..., [pnm_x,pnm_y,pnm_z]]}
Every key is a grid cell index and the value is a list of classified points. The list can be variable length but I can set it static, for example 1000 elements.
For now I tried something like this:
np.zeros(shape=(100,100,100,50,3))
But if I use numba.jit
with that function the execution time is few times worse than with pure Python.
Simple Python example of what I want to do:
def split_into_grid_py(points: np.array):
grid = {}
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
try:
grid[(r_x, r_y, r_z)].append(point)
except KeyError:
grid[(r_x, r_y, r_z)] = [point]
return grid
Is there any efficient way of doing that with numba? Per 10 execution in loop times are like:
With the same data set, so it's crap optimization.
My numba code:
@numba.jit(nopython=True)
def split_into_grid(points: np.array):
grid = np.zeros(shape=(100,100,100,50,3))
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
i = 0
for cell in grid[r_x][r_y][r_z]:
if not np.sum(cell):
grid[r_x][r_y][r_z][i] = point
break
i += 1
return grid
Upvotes: 1
Views: 87
Reputation: 50826
The pure Python version append items in O(1)
time thanks to the dictionary container while the Numba version use a O(n) array search (bounded by 50). Moreover, np.zeros(shape=(100,100,100,50,3))
allocate an array of about 1 GiB which resulting in many cache misses during the computation which will be done in RAM. Meanwhile, the pure Python version may fit in the CPU caches. There are two strategies to solve that.
The first strategy is to use 3 containers. An array keyGrid
mapping each grid cell to an offset in a second array valueGrid
or -1 if there is no point associated to this cell. valueGrid
contains all the points for a given grid cell. Finally, countingGrid
count the number of points per grid cell. Here is an untested example:
@numba.jit(nopython=True)
def split_into_grid(points: np.array):
# Note: use np.uint16 if the actual number of filled grid cell is less than 65536
keyGrid = np.full(shape=(100,100,100), -1, dtype=np.uint32)
i = 0
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
if keyGrid[r_x,r_y,r_z] < 0:
keyGrid[r_x,r_y,r_z] = i
i += 1
uniqueCloundPointCount = i
# Note the number of points per grid cell is also bounded by the type
countingGrid = np.zeros(uniqueCloundPointCount, dtype=np.uint8)
valueGrid = np.full((uniqueCloundPointCount, 50, 3), -1, dtype=np.int32)
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
key = keyGrid[r_x,r_y,r_z]
addingPos = countingGrid[key]
valueGrid[key, addingPos] = point
countingGrid[key] += 1
return (keyGrid, valueGrid, countingGrid)
Note that the arrays are quite small as long as not all grid cells contains points resulting in less cache misses. Moreover the mapping of each point is done in (small) constant time resulting in a much faster code.
The second strategy is to use the same method than in the pure Python implementation, but with Numba types. Indeed, Numba experimentally supports dictionaries. You can replace the exception with a dictionary check ((r_x, r_y, r_z) in grid
) which will cause less compilation issues and likely speed up resulting code. Note that Numba dict are often as fast as CPython ones (if not slower). So the resulting code may not be much much faster.
Upvotes: 2