Reputation: 337
I am working with a dataframe, similar to this:
records = pd.DataFrame({'id': [1001,1001,1001,1002,1002,1002,1003,1003,1003],
'salary': [500,500,500,300,300,400,100,100,100],
'date': ['1/1','4/1','6/1','1/1','4/1','6/1','1/1','4/1','6/1']})
Resulting in
id salary date
0 1001 500 1/1
1 1001 500 4/1
2 1001 500 6/1
3 1002 300 1/1
4 1002 300 4/1
5 1002 400 6/1
6 1003 100 1/1
7 1003 100 4/1
8 1003 100 6/1
There can be more than three entries for each id, it varies.
I want to be able to add a column or otherwise test if, for each id, there was a salary change at some point in the year.
I came across this solution:
records.groupby('id').salary.apply(lambda x: len(set(x)) - 1)
Unfortunately it did not work for me. In my large dataset, it did find some, but not all, instances where the salary had changed (I knew one ID in particular where there was a salary change, but it did not appear in the results.
Can someone point me in the right direction? Much appreciated!
Upvotes: 0
Views: 78
Reputation: 26676
Please try groupby
id and find out which salary is not equal to the first salary in each group using a lambda function. Allocate best attribute using np.where
import numpy as np
df['change']=np.where(df.groupby('id').salary.apply(lambda x: x!=x.iloc[0]),'Change','nochange')
print(df)
id salary date change
0 1001 500 1/1 nochange
1 1001 500 4/1 nochange
2 1001 500 6/1 nochange
3 1002 300 1/1 nochange
4 1002 300 4/1 nochange
5 1002 400 6/1 Change
6 1003 100 1/1 nochange
7 1003 100 4/1 nochange
8 1003 100 6/1 nochange
Upvotes: 1
Reputation: 13407
You should just be able to use groupby(...)["column"].nunique()
to get the number of unique elements in a grouped Series:
records.groupby('id').salary.nunique()
Upvotes: 1