barciewicz
barciewicz

Reputation: 3773

Delete a row from a dataframe if its column values are found in another

df1 = {
    'vouchers': [100, 200, 300, 400],
    'units': [11, 12, 12, 13],
    'some_other_data': ['a', 'b', 'c', 'd'],
    }
df2 = {
    'vouchers': [500, 200, 600, 300],
    'units': [11, 12, 12, 13],
    'some_other_data': ['b', 'd', 'c', 'a'],
    }

Given the two dataframes like above, I want to do the following: if voucher from df1 can be found in df2, and their corresponding unit is the same, then delete the entire voucher row from df1.

So in this case the desired output would be:

df1 = {
    'vouchers': [100, 300, 400],
    'units': [11, 12, 13],
    'some_other_data': ['a', 'c', 'd'],
    }

What would be the best way to achieve this?

Upvotes: 4

Views: 1404

Answers (6)

Karn Kumar
Karn Kumar

Reputation: 8816

Though we have many good answers, but the questions seems interesting so as the learning hence, i admit it in a great interest and would like to place another version which looks little simpler by using the booleans expression:

First DataFrame:

>>> df1
   vouchers  units some_other_data
0       100     11               a
1       200     12               b
2       300     12               c
3       400     13               d

Second DataFrame:

>>> df2
   vouchers  units some_other_data
0       500     11               a
1       200     12               b
2       600     12               c
3       300     13               d

Possible Simpler answer:

>>> df1[(df1 != df2).any(1)]
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

Solution 2: Using merge + indicator + query

>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

Solution 3:

>>> df1[~df1.isin(df2).all(axis=1)]
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

Upvotes: 2

Abdullah
Abdullah

Reputation: 129

Try this, it is simple:

excs = [] #will store the index of the values which are equal

for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
  for key2, value2 in zip(df2["vouchers"], df2["units"]):
    if key == key2 and value == value2:
      excs.append(i)

for exc in excs:
  del(df1["vouchers"][exc])
  del(df1["units"][exc])

Upvotes: 0

jpp
jpp

Reputation: 164613

One possibility via pd.DataFrame.duplicated:

df = pd.concat([df1, df2], ignore_index=True)
df = df.loc[~df.duplicated(subset=['vouchers', 'units'], keep=False)]
df = df.reindex(df.index & df1.index)

print(df)

#   some_other_data  units  vouchers
# 0               a     11       100
# 2               c     12       300
# 3               d     13       400

Upvotes: 1

cs95
cs95

Reputation: 402263

You can do this efficiently with index operations, using pd.Index.isin:

u = df1.set_index(['vouchers', 'units'])
df1[~u.index.isin(pd.MultiIndex.from_arrays([df2.vouchers, df2.units]))]

   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

Upvotes: 5

KaPy3141
KaPy3141

Reputation: 161

My solution:

df1 = {
    'vouchers': [100, 200, 300, 400],
    'units': [11, 12, 12, 13],
    'some_other_data': ['a', 'b', 'c', 'd']
    }
df2 = {
    'vouchers': [500, 200, 600, 300],
    'units': [11, 12, 12, 13],
    'some_other_data': ['a', 'b', 'c', 'd']
    }  

y = 0
for x in range(len(df1['vouchers'])):
    if df1['vouchers'][x-y] == df2['vouchers'][x]:
        if df1['units'][x-y] == df2['units'][x]:
            for key in df1.keys():
                del df1[key][x]
            y += 1

Upvotes: 0

BENY
BENY

Reputation: 323226

Doing with merge indicator , after we get the index need to remove , using drop

idx=df1.merge(df2,on=['vouchers','units'],indicator=True,how='left').\
     loc[lambda x : x['_merge']=='both'].index
df1=df1.drop(idx,axis=0)
df1
Out[374]: 
   vouchers  units some_other_data
0       100     11               a
2       300     12               c
3       400     13               d

Upvotes: 3

Related Questions