How to take difference of matching df['keys'] and create new column for them

Question

I'm trying to find the wage gap between genders given a set of majors.

Here is a text version of my table:

    gender   field group   logwage
0     male      BUSINESS  7.229572
10  female      BUSINESS  7.072464
1     male    COMM/JOURN  7.108538
11  female    COMM/JOURN  7.015018
2     male  COMPSCI/STAT  7.340410
12  female  COMPSCI/STAT  7.169401
3     male     EDUCATION  6.888829
13  female     EDUCATION  6.770255
4     male   ENGINEERING  7.397082
14  female   ENGINEERING  7.323996
5     male    HUMANITIES  7.053048
15  female    HUMANITIES  6.920830
6     male      MEDICINE  7.319011
16  female      MEDICINE  7.193518
17  female        NATSCI  6.993337
7     male        NATSCI  7.089232
18  female         OTHER  6.881126
8     male         OTHER  7.091698
9     male  SOCSCI/PSYCH  7.197572
19  female  SOCSCI/PSYCH  6.968322

diff hasn't worked for me, as it will take the difference between every consecutive major.

and here is the code as it is now:

for row in sorted_mfield:
   if sorted_mfield['field group']==sorted_mfield['field group'].shift(1):
    diff= lambda x: x[0]-x[1]

My next strategy would be to go back to the unsorted dataframe where male and female were their own columns and make a difference from there, but since I've spent an hour trying to do this, and am pretty new to pandas, I thought I would ask and find out how this works. Thanks.

Daniel Labbe · Accepted Answer

Solution using Pandas.DataFrame.shift() in a sorted version of the data:

df.sort_values(by=['field group', 'gender'], inplace=True)
df['gap'] = df.logwage - df.logwage.shift(1)
df[df.gender =='male'][['field group', 'gap']]

Producing the following output with the sample data:

    field group     gap
0   BUSINESS        0.157108
2   COMM/JOURN      0.093520
4   COMPSCI/STAT    0.171009
6   EDUCATION       0.118574
8   ENGINEERING     0.073086
10  HUMANITIES      0.132218
12  MEDICINE        0.125493
15  NATSCI          0.095895
17  OTHER           0.210572
18  SOCSCI/PSYCH    0.229250

Note: it considers that you will always have a pair of values for each field group. If you want to validate it or eliminate field groups without this pair, the code below does the filtering:

df_grouped = df.groupby('field group') 
df_filtered = df_grouped.filter(lambda x: len(x) == 2)

How to take difference of matching df['keys'] and create new column for them

Answers (2)

Code:

Related Questions

How to take difference of matching df[&#39;keys&#39;] and create new column for them

Answers (2)

Code:

Related Questions

How to take difference of matching df['keys'] and create new column for them