Grouping array to nested structure with numpy

Question

Say I've got a numpy array like this (larger and with different number of repetitions per date):

data = np.array([              \
   ["2011-01-01", 24, 554, 66],  \
   ["2011-01-01", 44, 524, 62],  \
   ["2011-01-04", 23, 454, 32],  \
   ["2011-01-04", 22, 45,  42],  \
   ["2011-01-04", 14, 364, 12]   \
])

Now I'd like to group the columns by date into a flatter structure:

[              
   ["2011-01-01", [[24, 554, 66], [44, 524, 62]]],
   ["2011-01-04", [[23, 454, 32], [22, 45, 42], [14, 364, 12]]]  
]

I do know how to do it by looping through my array and appending elements, but this seems to me extremely unpythonic. Is there some inbuilt numpy function to perform this operation or some custom one-liner for such a task?

cge · Accepted Answer

I'm not quite sure how you're storing the dates; the example you give will not actually work, as the dates will be interpreted as arithmetic. However, if you have a particular date date that you want that nested array for, you can easily get it through indexing:

data[ data[:,0]==date, 1: ]

That will select every row with the date you want, and then give you only the numbers. If you wanted this for each date, you could use the following:

[ [ date, data[ data[:,0]==date, 1: ] ] for date in np.unique(data[:,0]) ]

Note that this will give you the nested list part as a numpy array, but if you want it as a normal list, converting it would be easy.

Grouping array to nested structure with numpy

Answers (2)

Related Questions