Jonas Byström
Jonas Byström

Reputation: 26129

Extract numpy 3D array

Assuming I have this pandas DataFrame:

>>> import pandas as pd, numpy as np
>>> df1 = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5]], columns=list('ABCD') )
>>> df = pd.concat([df1,df1], keys='EF', axis=1)
>>> df
     E                F
     A    B   C  D    A    B   C  D
0  NaN  2.0 NaN  0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5  NaN  NaN NaN  5

how can I convert it into a 3D numpy array with shape (3,2,4)?

Upvotes: 1

Views: 210

Answers (2)

piRSquared
piRSquared

Reputation: 294218

This is a generic way that will square up the values and account for different sized levels than what you have as well as account for more than two levels.

cols = pd.MultiIndex.from_product(df.columns.levels, names=df.columns.names)
d = df.reindex(columns=cols)
v = d.values.reshape((len(d),) + tuple(l.size for l in cols.levels))
v

array([[[ nan,   2.,  nan,   0.],
        [ nan,   2.,  nan,   0.]],

       [[  3.,   4.,  nan,   1.],
        [  3.,   4.,  nan,   1.]],

       [[ nan,  nan,  nan,   5.],
        [ nan,  nan,  nan,   5.]]])

If you think you want to rearrange v you can transpose

v.transpose(1, 0, 2)

array([[[ nan,   2.,  nan,   0.],
        [  3.,   4.,  nan,   1.],
        [ nan,  nan,  nan,   5.]],

       [[ nan,   2.,  nan,   0.],
        [  3.,   4.,  nan,   1.],
        [ nan,  nan,  nan,   5.]]])

More complicated example with a 3 level MultiIndex columns object that has missing level components

np.random.seed([3, 1415])
df = pd.DataFrame(
    np.random.randint(10, size=(4, 7)),
    columns=pd.MultiIndex.from_tuples([
        ('A', 'X', 'Yes'),
        ('A', 'X', 'No'),
        ('A', 'Y', 'No'),
        ('B', 'X', 'Yes'),
        ('B', 'Z', 'Yes'),
        ('C', 'Y', 'No'),
        ('C', 'Z', 'No')
    ])
)

df

    A         B      C   
    X     Y   X   Z  Y  Z
  Yes No No Yes Yes No No
0   0  2  7   3   8  7  0
1   6  8  6   0   2  0  4
2   9  7  3   2   4  3  3
3   6  7  7   4   5  3  7

By doing the same thing as above we get

d

   A                      B                       C                  
   X      Y       Z       X       Y       Z       X      Y      Z    
  No Yes No Yes  No Yes  No Yes  No Yes  No Yes  No Yes No Yes No Yes
0  2   0  7 NaN NaN NaN NaN   3 NaN NaN NaN   8 NaN NaN  7 NaN  0 NaN
1  8   6  6 NaN NaN NaN NaN   0 NaN NaN NaN   2 NaN NaN  0 NaN  4 NaN
2  7   9  3 NaN NaN NaN NaN   2 NaN NaN NaN   4 NaN NaN  3 NaN  3 NaN
3  7   6  7 NaN NaN NaN NaN   4 NaN NaN NaN   5 NaN NaN  3 NaN  7 NaN

Which has squared up the missing bits so that we have a rectangular shaped array.

And I'll transpose v so it's easier to look at

v.transpose(3, 1, 2, 0)

array([[[[  2.,   8.,   7.,   7.],
         [  7.,   6.,   3.,   7.],
         [ nan,  nan,  nan,  nan]],

        [[ nan,  nan,  nan,  nan],
         [ nan,  nan,  nan,  nan],
         [ nan,  nan,  nan,  nan]],

        [[ nan,  nan,  nan,  nan],
         [  7.,   0.,   3.,   3.],
         [  0.,   4.,   3.,   7.]]],


       [[[  0.,   6.,   9.,   6.],
         [ nan,  nan,  nan,  nan],
         [ nan,  nan,  nan,  nan]],

        [[  3.,   0.,   2.,   4.],
         [ nan,  nan,  nan,  nan],
         [  8.,   2.,   4.,   5.]],

        [[ nan,  nan,  nan,  nan],
         [ nan,  nan,  nan,  nan],
         [ nan,  nan,  nan,  nan]]]])

Upvotes: 0

javidcf
javidcf

Reputation: 59681

You can just reshape the values of the data frame:

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[np.nan,      2, np.nan, 0],
                    [3,           4, np.nan, 1],
                    [np.nan, np.nan, np.nan, 5]],
                   columns=list('ABCD') )
df = pd.concat([df1, df1], keys='EF', axis=1)
# A view of the data, changing it changes df
df_three_dim = df.values.reshape((3, 2, 4))
# A new array, changing it does not change df
df_three_dim_copy = df.values.reshape((3, 2, 4)).copy()

Upvotes: 1

Related Questions