Pandas: Merging 2 rows

Question

Given a dataset like this:

pd.DataFrame([["2013-01-01", 4, 1000, 1],
              ["2013-01-02", 8, 500, 1],
              ["2013-01-03", 4, 1000, -1],
              ["2013-01-04", 4, 1000, 1],
              ["2013-01-05", 3, 300, 1]], 
             columns=["date", "shop_id", "item_id", "item_count"])

We have the item_count variable which is either positive or negative. Being positive means the item was sold and being negative means the item was returned.

What I would like to do is to remove the rows where the same item was returned. For instance in the dataframe above row with id 0 and 2 will cancel each other out but row with id 3 will remain. How can I achieve this? I though about few solutions but they involve using .apply() and are not very efficient for large datasets. Thanks

BENY · Accepted Answer

Since return Item always come with the ordered item, you can using shift up to local the item which has been returned

df[(df.groupby('shop_id').item_count.shift(-1).fillna(1).ne(-1))&(df.item_count.ne(-1))]
Out[247]: 
         date  shop_id  item_id  item_count
1  2013-01-02        8      500           1
3  2013-01-04        4     1000           1
4  2013-01-05        3      300           1

Update :

df.groupby(['shop_id','item_id']).agg({'item_count':'sum','date':'last'})
Out[249]: 
                       date  item_count
shop_id item_id                        
3       300      2013-01-05           1
4       1000     2013-01-04           1
8       500      2013-01-02           1

Pandas: Merging 2 rows

Answers (1)

Related Questions