Termo
Termo

Reputation: 71

Length of lists in array

I'm using scipy.spatial.cKDTree.query_ball_point to get the number of data points within a specific radius from each point in a grid layout.

It works but returns me an array of lists, and I only need the length of each list. Of course I can iterate through the array, but there must be a smart way to get the length of each list in an array of lists, or maybe another way to find the number of data points that are within a specific radius from each grid point.

Any ideas of how to do this the most efficient?

Upvotes: 0

Views: 121

Answers (4)

hpaulj
hpaulj

Reputation: 231605

The docs for this function say it returns a

If x is an array of points, returns an object array of shape tuple containing lists of neighbors

where

x : array_like, shape tuple + (self.m,)

The talk of shape tuple is a little unclear, but I think it refers x.shape[:-1], all but the last dimension of the input array. So for n points in a 2d space, x will be (n,2), and the result will be shape (n,).

For a simple 1d array of lists, just plain list comprehension is the best way:

In [36]: x=np.array([[1,2,3],[],[3,4]])

In [37]: x
Out[37]: array([[1, 2, 3], [], [3, 4]], dtype=object)

In [39]: [len(i) for i in x]
Out[39]: [3, 0, 2]

len(x) and x.shape apply to the array itself, not any elements.

x contains pointers to the lists; so any operation on those lists requires a Python access to those lists. There aren't many vectorized array operations that propagate down to the elements of an object array. After all the elements of such an array may be anything, including None.

If you input array is higher dimensional, e.g. (10,20,2), a 10x20 grid of points, it's probably easiest to flatten this first.

In [50]: X
Out[50]: 
array([[[1, 2, 3], [1]],
       [[1, 2, 3], [3, 4]]], dtype=object)

In [51]: np.array([len(i) for i in X.flat]).reshape(2,2)
Out[51]: 
array([[3, 1],
       [3, 2]])

In sum - list comprehension is the way to go, even though it is an array.

===============

There is another way of iterating over an array that handles multidimensions well. In some tests it may save 20% over list iterations, the use of np.frompyfunc.

np.frompyfunc(len,1,1)(x).astype(int)

It returns an array of the right shape, though it too is dtype object, hence the astype tag. np.vectorize uses this, but makes no claim to improving speed.

Upvotes: 1

Eelco Hoogendoorn
Eelco Hoogendoorn

Reputation: 10759

Sounds like you are looking for this: scipy.spatial.cKDTree.count_neighbors

Upvotes: 0

Paul
Paul

Reputation: 333

You can also use a more ''mathematical'' way:

lengths = map(len, myarray)

It will return a map object you can iterate.

Upvotes: 1

Vorsprung
Vorsprung

Reputation: 34387

Call len() on each element in the array

ie

lengths=[len(x) for x in myarray]

Upvotes: 2

Related Questions