Reputation: 1946
For the following dataframe, I need calculate the change in 'count', for each set of date, location_id, uid and include the set in the results.
# Sample DataFrame
df = pd.DataFrame({'date': ['2021-01-01', '2021-01-01','2021-01-01','2021-01-02', '2021-01-02','2021-01-02'],
'location_id':[1001,2001,3001, 1001,2001,3001],
'uid': ['001', '003', '002','001', '004','002'],
'uid_count':[1, 2,3 ,2, 2, 4]})
date location_id uid count
0 2021-01-01 1001 001 1
1 2021-01-01 2001 003 2
2 2021-01-01 3001 002 3
3 2021-01-02 1001 001 2
4 2021-01-02 2001 004 2
5 2021-01-02 3001 002 4
My desired results would look like:
# Desired Results
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 1
2001 004 0
3001 002 1
I thought I could do this via groupby
by using the following, but the desired calculation isn't made:
# Current code:
df.groupby(['date','location_id','uid'],sort=False).apply(lambda x: (x['count'].values[-1] - x['count'].values[0]))
# Current results:
date location_id uid
2021-01-01 1001 001 0
2001 003 0
3001 002 0
2021-01-02 1001 001 0
2001 004 0
3001 002 0
How can I get the desired results?
Upvotes: 1
Views: 959
Reputation: 62413
.transform()
is used to calculate the differences for consecutive occurrences of 'uid_count'
, for each uid
, with the same index as df
..groupby(['date','location_id','uid']
, is that each group only contains a single value.'uid_count'
at the end, with .drop(columns='uid_count')
, if desired.import pandas as pd
# sort the dataframe
df = df.sort_values(['date', 'location_id', 'uid'])
# groupby and transform based on the difference in uid_count
uid_count_diff = df.groupby(['location_id', 'uid']).uid_count.transform(lambda x: x.diff()).fillna(0).astype(int)
# create a column in df
df['uid_count_diff'] = uid_count_diff
# set the index
df = df.set_index(['date', 'location_id', 'uid'])
# result
uid_count uid_count_diff
date location_id uid
2021-01-01 1001 001 1 0
2001 003 2 0
3001 002 3 0
2021-01-02 1001 001 2 1
2001 004 2 0
3001 002 4 1
Upvotes: 1