Reputation: 23
I have data with this shape employee and his monthly salary that increase throw months
Employee year month Salary
PersonA 2001 1 $50000
PersonB 2001 5 $65000
PersonB 2002 1 $75000
PersonB 2002 3 $100000
PersonC 2002 5 $75000
PersonC 2002 6 $100000
PersonC 2003 3 $110000
PersonC 2003 9 $130000
PersonC 2004 3 $150000
PersonC 2005 3 $200000
I want to create same shape but with extra column called previous month salary
Employee year month Salary previous month salary
PersonA 2001 1 $50000 0
PersonB 2001 5 $65000 0
PersonB 2002 1 $75000 $65000
PersonB 2002 3 $100000 $75000
PersonC 2002 5 $75000 0
PersonC 2002 6 $100000 $75000
PersonC 2003 3 $110000 $100000
PersonC 2003 9 $130000 $110000
PersonC 2004 3 $150000 $130000
PersonC 2005 3 $200000 $150000
I tried groupby
in pandas but I could not subtract month value by one as this is only sample real data have all months so if I could get previous month value this is it.
But when I tried to groupby
I could not reach how to subtract
df["previous_salary"]=df.groupby(['year',"month"])['salary'].transform('mean').astype(np.float16)
df["previous_salary"]=df.groupby(['year',"month"])['salary']
and results was either mean or value of same month
Employee year month Salary previous month salary
PersonA 2001 1 $50000 $50000
PersonB 2001 5 $65000 $65000
PersonB 2002 1 $75000 $75000
PersonB 2002 3 $100000 $100000
PersonC 2002 5 $75000 $75000
PersonC 2002 6 $100000 $100000
PersonC 2003 3 $110000 $110000
PersonC 2003 9 $130000 $130000
PersonC 2004 3 $150000 $150000
PersonC 2005 3 $200000 $200000
Is there is way to subtract value of month before I groupby by it, or there is another way to do it
Upvotes: 2
Views: 384
Reputation: 23099
IIUC, you can try groupby
with shift
df["prev"] = (
df.sort_values(["Employee", "year", "month"]).groupby("Employee")["Salary"].shift(1)
)
print(df)
Employee year month Salary prev
0 PersonA 2001 1 $50000 NaN
1 PersonB 2001 5 $65000 NaN
2 PersonB 2002 1 $75000 $65000
3 PersonB 2002 3 $100000 $75000
4 PersonC 2002 5 $75000 NaN
5 PersonC 2002 6 $100000 $75000
6 PersonC 2003 3 $110000 $100000
7 PersonC 2003 9 $130000 $110000
8 PersonC 2004 3 $150000 $130000
9 PersonC 2005 3 $200000 $150000
Upvotes: 2
Reputation: 150735
You can use groupby().shift()
to get the previous data:
prev_salaries = df.groupby(['Employee']).Salary.shift()
# fill with current month
df['prev_salary'] = prev_salaries.fillna(df['Salary'])
Output:
Employee year month Salary prev_salary
0 PersonA 2001 1 $50000 $50000
1 PersonB 2001 5 $65000 $65000
2 PersonB 2002 1 $75000 $65000
3 PersonB 2002 3 $100000 $75000
4 PersonC 2002 5 $75000 $75000
5 PersonC 2002 6 $100000 $75000
6 PersonC 2003 3 $110000 $100000
7 PersonC 2003 9 $130000 $110000
8 PersonC 2004 3 $150000 $130000
9 PersonC 2005 3 $200000 $150000
Upvotes: 2