marty
marty

Reputation: 141

Best data structure for sparse data with multiple dimensions

I want to structure my data (similarly to pandas) to allow easy data exploration. I tried using xarray.DataArray for this task (the recommended way to represent n-dimensional data in pandas http://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel4d-and-panelnd-deprecated) but it appears inefficient given that my data is sparse. Is there a better way to structure my data under xarray.DataArray or under another Python data structure to allow easy data exploration?

Description of data

My data consists of prescriptions given to patients. Each entry consists of:

There might be several prescriptions on a date for different patients. A patient might also be prescribed several drugs (e.g., 2-3 drugs) at the same time with 'mandatory' dosage and 'optional/as needed' dosage. My dataset currently consists of 397 different patients, 1520 different dates and 161 different drugs. I only have 21790 non-null entries out of the 397*1520*161*2 entries (i.e., 0.01%).

Initial code

My data is currently organized as the following xarray.DataArray:

drugs = xarray.DataArray(dosages, coords={'patient': patients, 'time': dates, 
                                          'drug': drug_names, 'timing': timings, 
                                          'drug_type': ('drug', drug_types), 
                                          'drug_class': ('drug', drug_classes)},
                         dims=['patient', 'time', 'drug', 'timing'])

where dosages.shape = (len(patients), len(dates), len(drug_names), 2). The timing axis corresponds to 'scheduled' vs. 'as needed' dosage. All the missing/zero entries are set to numpy.nan.

Upvotes: 3

Views: 642

Answers (1)

jlovell
jlovell

Reputation: 66

Currently (as of version 0.10.2) xarray supports only dense arrays, but there is a Github issue https://github.com/pydata/xarray/issues/1375 requesting sparse array support. A quick check of that issue suggests this is being actively worked on by enabling xarray to support the sparse module.

Upvotes: 1

Related Questions