Reputation: 4354
I am trying to convert a multi-index pandas DataFrame
into a numpy.ndarray
. The DataFrame is below:
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
I would like the resulting numpy.ndarray
to be the following with np.shape() = (2,2,4)
:
[[[ 0.0 0.0 0.8 0.2 ]
[ 0.1 0.0 0.9 0.0 ]]
[[ 0.0 0.0 0.9 0.1 ]
[ 0.0 0.0 1.0 0.0]]]
I have tried df.as_matrix()
but this returns:
[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]
[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]
How do I return a list of lists for the first level with each list representing an Action
records.
Upvotes: 6
Views: 6137
Reputation: 2164
If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape
like this:
x = df.s1.to_numpy().reshape(df.index.levshape)
This will give you a (2,2) containing the value of s1.
Upvotes: 0
Reputation: 1
Elaborating on Brad Solomon's answer, to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:
def df_to_numpy(df):
try:
shape = [len(level) for level in df.index.levels]
except AttributeError:
shape = [len(df.index)]
ncol = df.shape[-1]
if ncol > 1:
shape.append(ncol)
return df.to_numpy().reshape(shape)
If df
has missing sub-indexes reshape
will not work. One way to add them would be (maybe there are better solutions):
def enforce_df_shape(df):
try:
ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
except AttributeError:
return df
fulldf = pd.DataFrame(-1, columns=df.columns, index=ind) # remove -1 to fill fulldf with nan
fulldf.update(df)
return fulldf
Upvotes: 0
Reputation: 4354
Using Divakar's suggestion, np.reshape()
worked:
>>> print(P)
s1 s2 s3 s4
Action State
1 s1 0.0 0 0.8 0.2
s2 0.1 0 0.9 0.0
2 s1 0.0 0 0.9 0.1
s2 0.0 0 1.0 0.0
>>> np.reshape(P,(2,2,-1))
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
>>> np.shape(P)
(2, 2, 4)
Upvotes: 0
Reputation: 40918
You could use the following:
dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0. 0. 0.8 0.2]
[ 0.1 0. 0.9 0. ]]
[[ 0. 0. 0.9 0.1]
[ 0. 0. 1. 0. ]]]
The first line just finds the number of groups that you want to groupby.
Why this (or groupby) is needed: as soon as you use .values
, you lose the dimensionality of the MultiIndex from pandas. So you need to re-pass that dimensionality to NumPy in some way.
Upvotes: 5
Reputation: 77027
One way
In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2],
[0.1, 0.0, 0.9, 0.0]],
[[0.0, 0.0, 0.9, 0.1],
[0.0, 0.0, 1.0, 0.0]]], dtype=object)
Upvotes: 1