Reputation: 221
I am in a situation where I get data on various items and sometimes some fields for an item changes. I'm looking for an easy way to detect when something has changed. Here's an example. Let's say have like this:
df = pd.DataFrame({'ID': [1, 1, 1, 2, 3, 2],
'SUBID': [1, 2, 1, 1, 1, 1],
'val1': [1, 4, 1, 7, 10, 8],
'val2': [2, 5, 2, 8, 11, 9],
'val3': [3, 6, 4, 9, 12, 9],
'date': ['20200413', '20200413', '20200414', '20200414',
'20200415', '20200416']})
ID SUBID val1 val2 val3 date
1 1 1 2 3 20200413
1 2 4 5 6 20200413
1 1 1 2 4 20200414
2 1 7 8 9 20200414
3 1 10 11 12 20200415
2 1 8 9 9 20200416
I'd like an operation that gives me something like this:
ID SUBID col date prev new
0 1 1 val3 20200414 3 4
1 2 1 val1 20200416 7 8
2 2 1 val2 20200416 8 9
Thanks in advance!
Upvotes: 1
Views: 93
Reputation: 30940
Here is my approach with GroupBy.shift
and DataFrame.melt
new_df = (df.melt(['ID','SUBID','date'], value_name='new')
.assign(prev=lambda x: x.groupby(['ID','SUBID','variable'])['new']
.shift())
.loc[lambda x: ~x['new'].eq(x['prev']) & x['prev'].notna()]
.sort_values(['ID','SUBID']))
print(new_df)
ID SUBID date variable new prev
14 1 1 20200414 val3 4 3.0
5 2 1 20200416 val1 8 7.0
11 2 1 20200416 val2 9 8.0
Upvotes: 2
Reputation: 29635
IIUC, you can do it by first reshaping the dataframe with set_index
, stack
, then groupby
some columns to agg
date and the values columns with first
and/or last
, then to filter only the one that have changed you can query
.
df_f = (df.set_index(['ID', 'SUBID', 'date'])
.rename_axis(columns='col').stack()
.reset_index('date', name='val')
.groupby(['ID', 'SUBID', 'col'])
.agg(date=('date','last'), prev=('val','first'), new=('val','last'))
.query('prev != new').reset_index())
print (df_f)
ID SUBID col date prev new
0 1 1 val3 20200414 3 4
1 2 1 val1 20200416 7 8
2 2 1 val2 20200416 8 9
Upvotes: 2