Reputation: 26129
Assuming I have this pandas DataFrame:
>>> import pandas as pd, numpy as np
>>> df1 = pd.DataFrame([[np.nan, 2, np.nan, 0], [3, 4, np.nan, 1], [np.nan, np.nan, np.nan, 5]], columns=list('ABCD') )
>>> df = pd.concat([df1,df1], keys='EF', axis=1)
>>> df
E F
A B C D A B C D
0 NaN 2.0 NaN 0 NaN 2.0 NaN 0
1 3.0 4.0 NaN 1 3.0 4.0 NaN 1
2 NaN NaN NaN 5 NaN NaN NaN 5
how can I convert it into a 3D numpy array with shape (3,2,4)?
Upvotes: 1
Views: 210
Reputation: 294218
This is a generic way that will square up the values and account for different sized levels than what you have as well as account for more than two levels.
cols = pd.MultiIndex.from_product(df.columns.levels, names=df.columns.names)
d = df.reindex(columns=cols)
v = d.values.reshape((len(d),) + tuple(l.size for l in cols.levels))
v
array([[[ nan, 2., nan, 0.],
[ nan, 2., nan, 0.]],
[[ 3., 4., nan, 1.],
[ 3., 4., nan, 1.]],
[[ nan, nan, nan, 5.],
[ nan, nan, nan, 5.]]])
If you think you want to rearrange v
you can transpose
v.transpose(1, 0, 2)
array([[[ nan, 2., nan, 0.],
[ 3., 4., nan, 1.],
[ nan, nan, nan, 5.]],
[[ nan, 2., nan, 0.],
[ 3., 4., nan, 1.],
[ nan, nan, nan, 5.]]])
More complicated example with a 3 level MultiIndex columns object that has missing level components
np.random.seed([3, 1415])
df = pd.DataFrame(
np.random.randint(10, size=(4, 7)),
columns=pd.MultiIndex.from_tuples([
('A', 'X', 'Yes'),
('A', 'X', 'No'),
('A', 'Y', 'No'),
('B', 'X', 'Yes'),
('B', 'Z', 'Yes'),
('C', 'Y', 'No'),
('C', 'Z', 'No')
])
)
df
A B C
X Y X Z Y Z
Yes No No Yes Yes No No
0 0 2 7 3 8 7 0
1 6 8 6 0 2 0 4
2 9 7 3 2 4 3 3
3 6 7 7 4 5 3 7
By doing the same thing as above we get
d
A B C
X Y Z X Y Z X Y Z
No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes No Yes
0 2 0 7 NaN NaN NaN NaN 3 NaN NaN NaN 8 NaN NaN 7 NaN 0 NaN
1 8 6 6 NaN NaN NaN NaN 0 NaN NaN NaN 2 NaN NaN 0 NaN 4 NaN
2 7 9 3 NaN NaN NaN NaN 2 NaN NaN NaN 4 NaN NaN 3 NaN 3 NaN
3 7 6 7 NaN NaN NaN NaN 4 NaN NaN NaN 5 NaN NaN 3 NaN 7 NaN
Which has squared up the missing bits so that we have a rectangular shaped array.
And I'll transpose v
so it's easier to look at
v.transpose(3, 1, 2, 0)
array([[[[ 2., 8., 7., 7.],
[ 7., 6., 3., 7.],
[ nan, nan, nan, nan]],
[[ nan, nan, nan, nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]],
[[ nan, nan, nan, nan],
[ 7., 0., 3., 3.],
[ 0., 4., 3., 7.]]],
[[[ 0., 6., 9., 6.],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]],
[[ 3., 0., 2., 4.],
[ nan, nan, nan, nan],
[ 8., 2., 4., 5.]],
[[ nan, nan, nan, nan],
[ nan, nan, nan, nan],
[ nan, nan, nan, nan]]]])
Upvotes: 0
Reputation: 59681
You can just reshape
the values
of the data frame:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[np.nan, 2, np.nan, 0],
[3, 4, np.nan, 1],
[np.nan, np.nan, np.nan, 5]],
columns=list('ABCD') )
df = pd.concat([df1, df1], keys='EF', axis=1)
# A view of the data, changing it changes df
df_three_dim = df.values.reshape((3, 2, 4))
# A new array, changing it does not change df
df_three_dim_copy = df.values.reshape((3, 2, 4)).copy()
Upvotes: 1