Reputation: 1005
I am trying to built a dataframe from one existing dataframe looking like that, I now know that the pivot method can do the trick but I am wondering why it would not work referencing the 'obs' column for the 'Quantity' column in a new df. Feel like it could be useful to know in some situation in addition to pivot.
Id Quantity obs
1 100121 113.0 0
2 100121 104.0 1
3 100121 -11.0 2
4 100122 220.0 0
5 100122 167.0 1
6 100122 100.0 2
I want to achieve the following result:
Id m1 m2 m3
1 100121 -11 104.0 113
4 100122 100 167 220
I tried doing this:
df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2]
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1]
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0]
for some reason I keep getting this output:
Id m1 m2 m3
1 100121 NaN NaN NaN
4 100122 NaN NaN NaN
What am I missing out here? I can't see where I am messing up
Upvotes: 2
Views: 761
Reputation: 153460
IIUC,
df.pivot('Id','obs','Quantity').rename(columns=lambda x: f'm{x+1}')
Output:
obs m1 m2 m3
Id
100121 113.0 104.0 -11.0
100122 220.0 167.0 100.0
You are missing the concept of intrinsic data alignment, meaning that pandas wants to align data using indexes.
To fix your code you need to strip the indexing from the right side of your statements converting the pd.Series into a 1D numpy array.
df_histo = pd.DataFrame(data11['Id'].drop_duplicates(keep='first'))
df_histo['m1'] = data11['Quantity'].loc[data11['obs']==2].to_numpy()
df_histo['m2'] = data11['Quantity'].loc[data11['obs']==1].to_numpy()
df_histo['m3'] = data11['Quantity'].loc[data11['obs']==0].to_numpy()
Output:
Id m1 m2 m3
1 100121 -11.0 104.0 113.0
4 100122 100.0 167.0 220.0
However, this is still not the best way to do this indexing and selection, a better way is like this:
df_histo['m1'] = data11.loc[data11['obs'] == 2, 'Quantity'].to_numpy()
df_histo['m2'] = data11.loc[data11['obs'] == 1, 'Quantity'].to_numpy()
df_histo['m3'] = data11.loc[data11['obs'] == 0, 'Quantity'].to_numpy()
df_histo
Output:
Id m1 m2 m3
1 100121 -11.0 104.0 113.0
4 100122 100.0 167.0 220.0
Upvotes: 2