kspmm
kspmm

Reputation: 23

How to groupby column value year and month to get previous month salary?

I have data with this shape employee and his monthly salary that increase throw months

Employee    year    month     Salary
PersonA     2001    1         $50000 
PersonB     2001    5         $65000 
PersonB     2002    1         $75000 
PersonB     2002    3         $100000 
PersonC     2002    5         $75000 
PersonC     2002    6         $100000 
PersonC     2003    3         $110000 
PersonC     2003    9         $130000 
PersonC     2004    3         $150000 
PersonC     2005    3         $200000

I want to create same shape but with extra column called previous month salary

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     0
PersonB     2001    5         $65000     0
PersonB     2002    1         $75000     $65000
PersonB     2002    3         $100000    $75000
PersonC     2002    5         $75000     0
PersonC     2002    6         $100000    $75000
PersonC     2003    3         $110000    $100000
PersonC     2003    9         $130000    $110000
PersonC     2004    3         $150000    $130000
PersonC     2005    3         $200000    $150000

I tried groupby in pandas but I could not subtract month value by one as this is only sample real data have all months so if I could get previous month value this is it.

But when I tried to groupby I could not reach how to subtract

df["previous_salary"]=df.groupby(['year',"month"])['salary'].transform('mean').astype(np.float16)

df["previous_salary"]=df.groupby(['year',"month"])['salary']

and results was either mean or value of same month

Employee    year    month     Salary     previous month salary 
PersonA     2001    1         $50000     $50000
PersonB     2001    5         $65000     $65000
PersonB     2002    1         $75000     $75000
PersonB     2002    3         $100000    $100000
PersonC     2002    5         $75000     $75000 
PersonC     2002    6         $100000    $100000
PersonC     2003    3         $110000    $110000
PersonC     2003    9         $130000    $130000
PersonC     2004    3         $150000    $150000
PersonC     2005    3         $200000    $200000

Is there is way to subtract value of month before I groupby by it, or there is another way to do it

Upvotes: 2

Views: 384

Answers (2)

Umar.H
Umar.H

Reputation: 23099

IIUC, you can try groupby with shift

df["prev"] = (
    df.sort_values(["Employee", "year", "month"]).groupby("Employee")["Salary"].shift(1)
)


print(df)
  Employee  year  month   Salary     prev
0  PersonA  2001      1   $50000      NaN
1  PersonB  2001      5   $65000      NaN
2  PersonB  2002      1   $75000   $65000
3  PersonB  2002      3  $100000   $75000
4  PersonC  2002      5   $75000      NaN
5  PersonC  2002      6  $100000   $75000
6  PersonC  2003      3  $110000  $100000
7  PersonC  2003      9  $130000  $110000
8  PersonC  2004      3  $150000  $130000
9  PersonC  2005      3  $200000  $150000

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

You can use groupby().shift() to get the previous data:

prev_salaries = df.groupby(['Employee']).Salary.shift()

# fill with current month
df['prev_salary'] = prev_salaries.fillna(df['Salary'])

Output:

  Employee  year  month   Salary prev_salary
0  PersonA  2001      1   $50000      $50000
1  PersonB  2001      5   $65000      $65000
2  PersonB  2002      1   $75000      $65000
3  PersonB  2002      3  $100000      $75000
4  PersonC  2002      5   $75000      $75000
5  PersonC  2002      6  $100000      $75000
6  PersonC  2003      3  $110000     $100000
7  PersonC  2003      9  $130000     $110000
8  PersonC  2004      3  $150000     $130000
9  PersonC  2005      3  $200000     $150000

Upvotes: 2

Related Questions