John
John

Reputation: 521

performing math on dataframe variables after groupby in pandas and bringing results back to original dataframe

First the data:

df
   City   Date     Sex  Weight
0    A  6/12/2015   M     185
1    A  6/12/2015   F     120
2    A  7/12/2015   M     210
3    A  7/12/2015   F     105
4    B  6/12/2015   M     225
5    B  6/12/2015   F     155
6    B  6/19/2015   M     167
7    B  6/19/2015   F     121

I am trying to subtract two weights, male-female. I am able to group the data and select the weights for each sex but am unable to simply create a new variable "wt_diff" and have the "wt_diff" appear on each row regardless of sex so that each city/date/sex group would in fact have, on the same row, the weight diff between the sexes.

I am looking to have this output:

df_new
City   Date     Sex  Weight     Wt_Diff
0    A  6/12/2015   M     185     65
1    A  6/12/2015   F     120     65
2    A  7/12/2015   M     210    105
3    A  7/12/2015   F     105    105
4    B  6/12/2015   M     225     70
5    B  6/12/2015   F     155     70
6    B  6/19/2015   M     167     46
7    B  6/19/2015   F     121     46

I can get the weight diffs by using this:

def diffw(df):
return(np.diff(df.Weight)*-1)

gb = ['Date', 'City']
gb=df.groupby(gb).apply(diffw)

gb

Date       City
6/12/2015  A        [65]
           B        [70]
6/19/2015  B        [46]
7/12/2015  A       [105]
dtype: object

I am just at a loss on how to get the wt_diffs back to the original df on each row.

Many thanks for any help . . . John

Upvotes: 1

Views: 223

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 77961

You can use GroupBy.transform:

>>> f = df.groupby(['City', 'Date'])['Weight'].transform
>>> df['Wt_Diff'] = f('max') - f('min')
>>> df
  City       Date Sex  Weight  Wt_Diff
0    A  6/12/2015   M     185       65
1    A  6/12/2015   F     120       65
2    A  7/12/2015   M     210      105
3    A  7/12/2015   F     105      105
4    B  6/12/2015   M     225       70
5    B  6/12/2015   F     155       70
6    B  6/19/2015   M     167       46
7    B  6/19/2015   F     121       46

Edit: if max - min does not work, the easiest thing would be to add signed weight column first:

>>> df['+/-Weight'] = df['Weight'].where(df['Sex'] == 'M', -df['Weight'])
>>> df['Wt_Diff'] = df.groupby(['City', 'Date'])['+/-Weight'].transform('sum')

Upvotes: 2

Related Questions