Reputation: 17871
Say I've got a numpy array like this (larger and with different number of repetitions per date):
data = np.array([ \
["2011-01-01", 24, 554, 66], \
["2011-01-01", 44, 524, 62], \
["2011-01-04", 23, 454, 32], \
["2011-01-04", 22, 45, 42], \
["2011-01-04", 14, 364, 12] \
])
Now I'd like to group the columns by date into a flatter structure:
[
["2011-01-01", [[24, 554, 66], [44, 524, 62]]],
["2011-01-04", [[23, 454, 32], [22, 45, 42], [14, 364, 12]]]
]
I do know how to do it by looping through my array and appending elements, but this seems to me extremely unpythonic. Is there some inbuilt numpy function to perform this operation or some custom one-liner for such a task?
Upvotes: 3
Views: 293
Reputation: 10759
This is a typical grouping problem, which can be solved efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unqiue, groups = npi.group_by(data[:,0], data[:, 1:].astype(np.int))
While the currently accepted answer is not inelegant, it has quadratic performance. This solution is nlogn, and avoids any python loops; thus more 'numpythonic' :).
Upvotes: 0
Reputation: 9890
I'm not quite sure how you're storing the dates; the example you give will not actually work, as the dates will be interpreted as arithmetic. However, if you have a particular date date
that you want that nested array for, you can easily get it through indexing:
data[ data[:,0]==date, 1: ]
That will select every row with the date you want, and then give you only the numbers. If you wanted this for each date, you could use the following:
[ [ date, data[ data[:,0]==date, 1: ] ] for date in np.unique(data[:,0]) ]
Note that this will give you the nested list part as a numpy array, but if you want it as a normal list, converting it would be easy.
Upvotes: 3