Murcielago
Murcielago

Reputation: 1005

pandas: NaN result with loc[]

I am trying to built a dataframe from one existing dataframe looking like that, I now know that the pivot method can do the trick but I am wondering why it would not work referencing the 'obs' column for the 'Quantity' column in a new df. Feel like it could be useful to know in some situation in addition to pivot.

       Id  Quantity  obs
1  100121     113.0    0
2  100121     104.0    1
3  100121     -11.0    2
4  100122     220.0    0
5  100122     167.0    1
6  100122     100.0    2

I want to achieve the following result:

         Id  m1   m2    m3
1    100121 -11  104.0  113
4    100122  100 167    220

I tried doing this:

df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2]
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1]
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0]

for some reason I keep getting this output:

         Id  m1   m2    m3
1    100121  NaN  NaN   NaN
4    100122  NaN  NaN   NaN

What am I missing out here? I can't see where I am messing up

Upvotes: 2

Views: 761

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

IIUC,

df.pivot('Id','obs','Quantity').rename(columns=lambda x: f'm{x+1}')

Output:

obs        m1     m2     m3
Id                         
100121  113.0  104.0  -11.0
100122  220.0  167.0  100.0

You are missing the concept of intrinsic data alignment, meaning that pandas wants to align data using indexes.

To fix your code you need to strip the indexing from the right side of your statements converting the pd.Series into a 1D numpy array.

df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2].to_numpy()
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1].to_numpy()
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0].to_numpy()

Output:

       Id     m1     m2     m3
1  100121  -11.0  104.0  113.0
4  100122  100.0  167.0  220.0

However, this is still not the best way to do this indexing and selection, a better way is like this:

df_histo['m1'] = data11.loc[data11['obs'] == 2, 'Quantity'].to_numpy()
df_histo['m2'] = data11.loc[data11['obs'] == 1, 'Quantity'].to_numpy()
df_histo['m3'] = data11.loc[data11['obs'] == 0, 'Quantity'].to_numpy()
df_histo

Output:

       Id     m1     m2     m3
1  100121  -11.0  104.0  113.0
4  100122  100.0  167.0  220.0

Upvotes: 2

Related Questions