Reputation: 521
First the data:
df
City Date Sex Weight
0 A 6/12/2015 M 185
1 A 6/12/2015 F 120
2 A 7/12/2015 M 210
3 A 7/12/2015 F 105
4 B 6/12/2015 M 225
5 B 6/12/2015 F 155
6 B 6/19/2015 M 167
7 B 6/19/2015 F 121
I am trying to subtract two weights, male-female. I am able to group the data and select the weights for each sex but am unable to simply create a new variable "wt_diff" and have the "wt_diff" appear on each row regardless of sex so that each city/date/sex group would in fact have, on the same row, the weight diff between the sexes.
I am looking to have this output:
df_new
City Date Sex Weight Wt_Diff
0 A 6/12/2015 M 185 65
1 A 6/12/2015 F 120 65
2 A 7/12/2015 M 210 105
3 A 7/12/2015 F 105 105
4 B 6/12/2015 M 225 70
5 B 6/12/2015 F 155 70
6 B 6/19/2015 M 167 46
7 B 6/19/2015 F 121 46
I can get the weight diffs by using this:
def diffw(df):
return(np.diff(df.Weight)*-1)
gb = ['Date', 'City']
gb=df.groupby(gb).apply(diffw)
gb
Date City
6/12/2015 A [65]
B [70]
6/19/2015 B [46]
7/12/2015 A [105]
dtype: object
I am just at a loss on how to get the wt_diffs back to the original df on each row.
Many thanks for any help . . . John
Upvotes: 1
Views: 223
Reputation: 77961
You can use GroupBy.transform
:
>>> f = df.groupby(['City', 'Date'])['Weight'].transform
>>> df['Wt_Diff'] = f('max') - f('min')
>>> df
City Date Sex Weight Wt_Diff
0 A 6/12/2015 M 185 65
1 A 6/12/2015 F 120 65
2 A 7/12/2015 M 210 105
3 A 7/12/2015 F 105 105
4 B 6/12/2015 M 225 70
5 B 6/12/2015 F 155 70
6 B 6/19/2015 M 167 46
7 B 6/19/2015 F 121 46
Edit: if max - min
does not work, the easiest thing would be to add signed weight column first:
>>> df['+/-Weight'] = df['Weight'].where(df['Sex'] == 'M', -df['Weight'])
>>> df['Wt_Diff'] = df.groupby(['City', 'Date'])['+/-Weight'].transform('sum')
Upvotes: 2