James
James

Reputation: 1260

How to convert a pandas MultiIndex DataFrame into a 3D array

Suppose I have a MultiIndex DataFrame:

                                c       o       l       u
major       timestamp                       
ONE         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

TWO         2019-01-22 18:12:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:13:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:14:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:15:00 0.00008 0.00008 0.00008 0.00008 
            2019-01-22 18:16:00 0.00008 0.00008 0.00008 0.00008

I want to generate a NumPy array from this DataFrame with a 3-dimensional, given the dataframe has 15 categories in the major column, 4 columns and one time index of length 5. I would like to create a numpy array with a shape of (4,15,5) denoting (columns, categories, time_index) respectively.

should create an array:

array([[[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]],

       [[8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05],
        [8.e-05, 8.e-05, 8.e-05, 8.e-05, 8.e-05]]])

One used to be able to do this with pd.Panel:

panel = pd.Panel(items=[columns], major_axis=[categories], minor_axis=[time_index], dtype=np.float32)
... 

How would I be able to most effectively accomplish this with a multi index dataframe? Thanks

Upvotes: 15

Views: 10144

Answers (3)

aEgoist
aEgoist

Reputation: 31

In case you have different length for minor axis, you may try this:

df.unstack().ffill().bfill().stack().values.reshape(*df.index.levshape,-1)

still seems awkward through, why Panel was deprecated anyway?

Upvotes: 2

unutbu
unutbu

Reputation: 879391

Since df.values is a (15*100, 4)-shaped array, you can call reshape to make it a (15, 100, 4)-shaped array:

arr = df.values.reshape(15, 100, 4)

Then call transpose to rearrange the order of the axes:

arr = arr.transpose(2, 0, 1)

Now arr has shape (4, 15, 100).


Using reshape/transpose is ~960x faster than to_xarray().to_array():

In [21]: df = pd.DataFrame(np.random.randint(10, size=(15*100, 4)), index=pd.MultiIndex.from_product([range(15), range(100)], names=['A','B']), columns=list('colu'))

In [22]: %timeit arr = df.values.reshape(15, 100, 4).transpose(2, 0, 1)
3.31 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [24]: %timeit df.to_xarray().to_array()
3.18 ms ± 24.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [25]: 3180/3.31
Out[25]: 960.7250755287009

Upvotes: 17

Josh Friedlander
Josh Friedlander

Reputation: 11657

How about using xarray?

res = df.to_xarray().to_array()

Result is an array of shape (4, 15, 5)

In fact the docs now recommend this as an alternative to pandas Panel. Note that you must have the xarray package installed.

Upvotes: 12

Related Questions