Reputation: 1260
Suppose I have a MultiIndex DataFrame:
c o l u
major timestamp
ONE 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008
TWO 2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008
2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008
I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.
should create an array:
array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],
[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],
[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],
[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])
One used to be able to do this with pd.Panel:
panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
...
How would I be able to most effectively accomplish this with a multi index dataframe? Thanks
Upvotes: 15
Views: 10144
Reputation: 31
In case you have different length for minor axis, you may try this:
df.unstack().ffill().bfill().stack().values.reshape(*df.index.levshape,-1)
still seems awkward through, why Panel was deprecated anyway?
Upvotes: 2
Reputation: 879391
Since df.values
is a (15*100, 4)
-shaped array, you can call reshape
to make it a (15, 100, 4)
-shaped array:
arr = df.values.reshape(15, 100, 4)
Then call transpose
to rearrange the order of the axes:
arr = arr.transpose(2, 0, 1)
Now arr
has shape (4, 15, 100)
.
Using reshape/transpose
is ~960x faster than to_xarray().to_array()
:
In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))
In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [24]: %timeit df.to_xarray().to_array()
3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [25]: 3180/3.31
Out[25]: 960.7250755287009
Upvotes: 17
Reputation: 11657
How about using xarray
?
res = df.to_xarray().to_array()
Result is an array of shape (4, 15, 5)
In fact the docs now recommend this as an alternative to pandas Panel
. Note that you must have the xarray
package installed.
Upvotes: 12