Reputation: 2669
Given a dataset like this:
pd.DataFrame([["2013-01-01", 4, 1000, 1],
["2013-01-02", 8, 500, 1],
["2013-01-03", 4, 1000, -1],
["2013-01-04", 4, 1000, 1],
["2013-01-05", 3, 300, 1]],
columns=["date", "shop_id", "item_id", "item_count"])
We have the item_count
variable which is either positive or negative. Being positive means the item was sold and being negative means the item was returned.
What I would like to do is to remove the rows where the same item was returned. For instance in the dataframe above row with id 0 and 2 will cancel each other out but row with id 3 will remain. How can I achieve this? I though about few solutions but they involve using .apply()
and are not very efficient for large datasets. Thanks
Upvotes: 0
Views: 56
Reputation: 323226
Since return Item always come with the ordered item, you can using shift
up to local the item which has been returned
df[(df.groupby('shop_id').item_count.shift(-1).fillna(1).ne(-1))&(df.item_count.ne(-1))]
Out[247]:
date shop_id item_id item_count
1 2013-01-02 8 500 1
3 2013-01-04 4 1000 1
4 2013-01-05 3 300 1
Update :
df.groupby(['shop_id','item_id']).agg({'item_count':'sum','date':'last'})
Out[249]:
date item_count
shop_id item_id
3 300 2013-01-05 1
4 1000 2013-01-04 1
8 500 2013-01-02 1
Upvotes: 1