LNRD.CLL
LNRD.CLL

Reputation: 385

np reshape within pandas apply

Arise Exception: Data must be 1-dimensional.

I'll present the problem with a toy example to be clear.

import pandas as pd
import numpy as np

Initial dataframe:

df = pd.DataFrame({"A": [[10,15,12,14],[20,30,10,43]], "R":[2,2] ,"C":[2,2]})
>>df

       A                    C   R
0   [10, 15, 12, 14]    2   2
1   [20, 30, 10, 43]    2   2

Conversion to numpy array and reshape:

df['A'] = df['A'].apply(lambda x: np.array(x))
df.apply(lambda x: print(x[0],(x[1],x[2])) ,axis=1)
df['A_reshaped'] = df.apply(lambda x[['A','R','C']]: np.reshape(x[0],(x[1],x[2])),axis=1)
df

       A                    C    R           A_reshaped
0   [10, 15, 12, 14]    2   2        [[10,15],[12,14]]
1   [20, 30, 10, 43]    2   2        [[20,30],[10,43]]

Someone know the reason? It seems to not accept 2 dimensional arrays in pandas cells but it's strange...

Thanks in advance for any help!!!

Upvotes: 2

Views: 495

Answers (1)

Ami Tavory
Ami Tavory

Reputation: 76336

Using apply directly doesn't work - the return value is a numpy 2d array, and placing it back in the DataFrame confuses Pandas, for some reason.

This seems to work, though:

df['reshaped'] = pd.Series([a.reshape((c, r)) for (a, c, r) in zip(df.A, df.C, df.R)])

>>> df
                  A  C  R              reshaped
0  [10, 15, 12, 14]  2  2  [[10, 15], [12, 14]]
1  [20, 30, 10, 43]  2  2  [[20, 30], [10, 43]]

Upvotes: 2

Related Questions