Reputation: 785
I have a pandas dataframe with a bunch of records on certain dates. I need to group them by date, and do a check to see if the next day has records that are also in the prior day, specifically I need to output what records were deleted.
Here is an example dataset:
Date Item
20160101 apple
20160101 pear
20160101 banana
20160102 apple
20160102 pear
20160102 beans
I need to figure out the differences that occur for each date, so for this example from 01/02/2016 there is a an added string 'beans' and a 'banana' was removed from the group.
So far I have as my code:
groups = frame['Item'].groupby(frame['Date'])
for date, item in groups:
for i in item:
if i not in item[:-1]:
print date, item, 'Deleted'
This doesn't seem to be working. I should be expecting:
20160102 , banana, Deleted
Thanks for your help!
Upvotes: 3
Views: 201
Reputation: 294488
diffs = frame.groupby(frame.columns.tolist()).size().unstack(fill_value=0).diff()
diffs
diffs.mask(diffs.eq(0)).stack().map({-1: 'deleted', 1: 'added'})
Date Item
20160102 banana deleted
beans added
dtype: object
Upvotes: 3