Hana
Hana

Reputation: 1470

How to remove rows of a DataFrame based off of data from another DataFrame?

I'm new to pandas and I'm trying to figure this scenario out: I have a sample DataFrame with two products. df =

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-5-18   Fruit Snacks  1.99
          10    1-8-18   Fruit Snacks  1.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-9-18         Apples  1.49
          45    1-10-18        Apples  1.49
          45    1-13-18        Apples  1.49
          45    1-15-18        Apples  2.99 

I also have another small DataFrame that looks like this (which shows promotional prices of the same products): df2=

  Product_Num   Price 
          10    1.99
          45    1.49 

Notice that df2 does not contain columns 'Date' nor 'Description.' What I want to do is to remove all promo prices from df1 (for all dates that are on promo), using the data from df1. What is the best way to do this?

So, I want to see this:

  Product_Num     Date   Description  Price 
          10    1-1-18   Fruit Snacks  2.99
          10    1-2-18   Fruit Snacks  2.99
          10    1-10-18  Fruit Snacks  2.99
          45    1-1-18         Apples  2.99 
          45    1-3-18         Apples  2.99
          45    1-5-18         Apples  2.99
          45    1-15-18        Apples  2.99 

I was thinking of doing a merge on columns Price and Product_Num, then seeing what I can do from there. But I was getting confused because of the multiple dates.

Upvotes: 6

Views: 18775

Answers (4)

Chava S
Chava S

Reputation: 1

cute and readable

promo_prices = df2['Price']
promo_prods = df2['Product_Num']

no_pro = df

for price, prod in zip(promo_prices, promo_prods):
    no_pro = no_pro.where(df != (price or prod)).dropna()

Upvotes: 0

DJK
DJK

Reputation: 9264

You could turn df2 into a dictionary and then filter out the values in df1

df[df[df2.columns].isin(df2.to_dict('list')).sum(1) <= 1]

Yeilds

      Date   Description  Price  Product_Num
0    1-1-18  Fruit Snacks   2.99           10
1    1-2-18  Fruit Snacks   2.99           10
4   1-10-18  Fruit Snacks   2.99           10
5    1-1-18        Apples   2.99           45
6    1-3-18        Apples   2.99           45
7    1-5-18        Apples   2.99           45
11  1-15-18        Apples   2.99           45

Upvotes: 1

BENY
BENY

Reputation: 323226

isin with &

df.loc[~((df.Product_Num.isin(df2['Product_Num']))&(df.Price.isin(df2['Price']))),:]
Out[246]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99

Update

df.loc[~df.index.isin(df.merge(df2.assign(a='key'),how='left').dropna().index)]
Out[260]: 
    Product_Num     Date  Description  Price
0            10   1-1-18  FruitSnacks   2.99
1            10   1-2-18  FruitSnacks   2.99
4            10  1-10-18  FruitSnacks   2.99
5            45   1-1-18       Apples   2.99
6            45   1-3-18       Apples   2.99
7            45   1-5-18       Apples   2.99
11           45  1-15-18       Apples   2.99

Upvotes: 9

piratefache
piratefache

Reputation: 1368

With Product_Num as index for both Dataframe, you can drop index from df1 for df2, then concatenate the dataframes :

import pandas as pd

df1 = pd.DataFrame({'Product_Num':[1,2,3,4], 'Date': ['01/01/2012','01/02/2013','02/03/2013','04/02/2013'], 'Price': [10,10,10,10]})
df1 = df1.set_index('Product_Num')
df2 = pd.DataFrame({'Product_Num':[2], 'Date':['03/3/2012'], 'Price': [5]})
df2 = df2.set_index('Product_Num')

Drop and concatenate:

df_new = df1.drop(df2.index)
df_new = pd.concat([df_new, df2])

Result:

               Date  Price
Product_Num                   
1            01/01/2012     10
3            02/03/2013     10
4            04/02/2013     10
2             03/3/2012      5

Upvotes: 2

Related Questions