Determine if any changes occurred in a grouped dataframe

Question

I am working with a dataframe, similar to this:

records = pd.DataFrame({'id': [1001,1001,1001,1002,1002,1002,1003,1003,1003], 
'salary': [500,500,500,300,300,400,100,100,100],
                     'date': ['1/1','4/1','6/1','1/1','4/1','6/1','1/1','4/1','6/1']})

Resulting in

    id  salary  date
0   1001    500 1/1
1   1001    500 4/1
2   1001    500 6/1
3   1002    300 1/1
4   1002    300 4/1
5   1002    400 6/1
6   1003    100 1/1
7   1003    100 4/1
8   1003    100 6/1

There can be more than three entries for each id, it varies.

I want to be able to add a column or otherwise test if, for each id, there was a salary change at some point in the year.

I came across this solution:

records.groupby('id').salary.apply(lambda x: len(set(x)) - 1)

Unfortunately it did not work for me. In my large dataset, it did find some, but not all, instances where the salary had changed (I knew one ID in particular where there was a salary change, but it did not appear in the results.

Can someone point me in the right direction? Much appreciated!

Cameron Riddell · Accepted Answer

You should just be able to use groupby(...)["column"].nunique() to get the number of unique elements in a grouped Series:

records.groupby('id').salary.nunique()

Determine if any changes occurred in a grouped dataframe

Answers (2)

Related Questions