troy.unrau
troy.unrau

Reputation: 1152

Split numpy recarray based on value in one column

my real data has some 10000+ items. I have a complicated numpy record array of a format roughly like:

a = (((1., 2., 3.), 4., 'metadata1'), 
     ((1., 3., 5.), 5., 'metadata1'), 
     ((1., 2., 4.), 5., 'metadata2'),
     ((1., 2., 5.), 5., 'metadata2'),  
     ((1., 3., 8.), 5., 'metadata3'))

My columns are defined by dtype = [('coords', '3f4'), ('values', 'f4'), ('meta', 'S10')]. I get a list of all my possible meta values by doing set(a['meta']).

And I'd like to split it into smaller lists based on the 'meta' column. Ideally, I'd like results like:

a['metadata1'] == (((1., 2., 3.), 4.), ((1., 3., 5.), 5.))
a['metadata2'] == (((1., 2., 4.), 5.), ((1., 2., 5.), 5.))
a['metadata3'] == (((1., 3., 8.), 5.))

or

a[0] = (((1., 2., 3.), 4., 'metadata1'), ((1., 3., 5.), 5., 'metadata1'))
a[1] = (((1., 2., 4.), 5., 'metadata2'), ((1., 2., 5.), 5., 'metadata2'))
a[2] = (((1., 3., 8.), 5., 'metadata3'))

or any other conveniently split format.

Although, for a large dataset, the former is better on memory. Any ideas on how to do this split? I've seen some other questions here, but they are all testing for numerical values.

Upvotes: 0

Views: 1158

Answers (1)

ebarr
ebarr

Reputation: 7842

You can always access those rows easily using fancy indexing:

In [34]: a[a['meta']=='metadata2']
Out[34]: 
rec.array([(array([ 1.,  2.,  4.], dtype=float32), 5.0, 'metadata2'),
           (array([ 1.,  2.,  5.], dtype=float32), 5.0, 'metadata2')], 
          dtype=[('coords', '<f4', (3,)), ('values', '<f4'), ('meta', 'S10')])

You can use this approach to create lookup dictionary for the different meta types:

meta_dict = {}
for meta_type in np.unique(a['meta']):
    meta_dict[meta_type] = a[a['meta']==meta_type]

This will be very inefficient if there are a large number of meta types.

A more efficient solution might be to look into using a Pandas dataframe. These have a group by functionality that performs exactly the task you describe.

Upvotes: 2

Related Questions