Reputation: 141
I want to structure my data (similarly to pandas
) to allow easy data exploration. I tried using xarray.DataArray
for this task (the recommended way to represent n-dimensional data in pandas
http://pandas.pydata.org/pandas-docs/stable/dsintro.html#panel4d-and-panelnd-deprecated) but it appears inefficient given that my data is sparse. Is there a better way to structure my data under xarray.DataArray
or under another Python data structure to allow easy data exploration?
Description of data
My data consists of prescriptions given to patients. Each entry consists of:
There might be several prescriptions on a date for different patients. A patient might also be prescribed several drugs (e.g., 2-3 drugs) at the same time with 'mandatory' dosage and 'optional/as needed' dosage. My dataset currently consists of 397 different patients, 1520 different dates and 161 different drugs. I only have 21790 non-null entries out of the 397*1520*161*2 entries (i.e., 0.01%).
Initial code
My data is currently organized as the following xarray.DataArray
:
drugs = xarray.DataArray(dosages, coords={'patient': patients, 'time': dates,
'drug': drug_names, 'timing': timings,
'drug_type': ('drug', drug_types),
'drug_class': ('drug', drug_classes)},
dims=['patient', 'time', 'drug', 'timing'])
where dosages.shape = (len(patients), len(dates), len(drug_names), 2)
. The timing
axis corresponds to 'scheduled' vs. 'as needed' dosage. All the missing/zero entries are set to numpy.nan
.
Upvotes: 3
Views: 642
Reputation: 66
Currently (as of version 0.10.2) xarray supports only dense arrays, but there is a Github issue https://github.com/pydata/xarray/issues/1375 requesting sparse array support. A quick check of that issue suggests this is being actively worked on by enabling xarray to support the sparse module.
Upvotes: 1