Reputation: 11
I have an array that looks something like:
[0 x1 0 0 y1 0 z1
0 0 x2 0 y2 0 z2
0 0 x3 0 0 y3 z3
0 0 x4 0 0 y4 z4
0 x5 0 0 0 y5 z5
0 0 0 0 y6 0 0]
I need to determine set of connected line (i.e. line that connects to the points [x1,x2,x3..], [y1,y2,y3...], [z1,z2,z3..]) from the array and then need to find maximum value in each line i.e. max{x1,x2,x3,...}, max{y1,y2,y3..} etc. i was trying to do nearest neighbor search using kdtree but it return the same array. I have array of the size (200 x 8000). is there any easier way to do this? Thx.
Upvotes: 1
Views: 677
Reputation: 21839
Another way of speeding up your line searching algorithm would be to pre-calculate the start points of each line, and then apply the expensive logic to calculate lines from each of these points.
I have taken a limited view of the logic (because you haven't provided the full line identification logic), which can compute the start points in fast vectorised code.
The first step in being able to implement such a thing in fast vectorised code is to be able to figure out which points are in a line, but their direct points above are not:
import numpy
# using the array that was provided in the question
a = """0 x1 0 0 y1 0 z1
0 0 x2 0 y2 0 z2
0 0 x3 0 0 y3 z3
0 0 x4 0 0 y4 z4
0 x5 0 0 0 y5 z5
0 0 0 0 y6 0 0"""
array = numpy.array([int(v.strip()) if v.strip().isdigit() else i for i, v in enumerate(a.split(' '))]).reshape(6, 7)
Results in an array which looks like:
>>> print repr(array)
array([[ 0, 1, 0, 0, 4, 0, 6],
[ 0, 0 9, 0, 11, 0, 13],
[ 0, 0, 16, 0, 0, 19, 20],
[ 0, 0, 23, 0, 0, 26, 27],
[ 0, 29, 0, 0, 0, 33, 34],
[ 0, 0, 0, 0, 39, 0, 0]])
From here, we can do some numpy rolling:
>>> print `numpy.roll(array, 1, axis=0)`
array([[ 0, 0, 0, 0, 39, 0, 0],
[ 0, 1, 0, 0, 4, 0, 6],
[ 0, 0, 9, 0, 11, 0, 13],
[ 0, 0, 16, 0, 0, 19, 20],
[ 0, 0, 23, 0, 0, 26, 27],
[ 0, 29, 0, 0, 0, 33, 34]])
Which can be combined to give us the vertical start points of the lines:
>>> potential_start_points = (array != 0) & (numpy.roll(array, 1, axis=0) == 0)
>>> # include the top row points, as they are certainly start points
>>> potential_start_points[0, :] = (array != 0)[0, :]
>>> print `potential_start_points`
array([[False, True, False, False, True, False, True],
[False, False, True, False, False, False, False],
[False, False, False, False, False, True, False],
[False, False, False, False, False, False, False],
[False, True, False, False, False, False, False],
[False, False, False, False, True, False, False]], dtype=bool)
From here, it is possible to refine the vectorised logic to pick out diagonals etc., but I would be tempted to iterate over each of the Trues and apply more complex index based logic.
xs, ys = numpy.where(potential_start_points)
for x, y in zip(xs, ys):
# do more complex logic here ...
After all, the problem, in this case, is now reduced from iterating over 6x7=42 numbers to iterating over just 7.
Upvotes: 1
Reputation: 21839
I don't know of anything which provides the functionality you desire out of the box. If you have already written the logic, and it is just slow, have you considered Cython-ing your code. For simple typed looping operations you could get a significant speedup.
Upvotes: 1