Reputation: 3773
df1 = {
'vouchers': [100, 200, 300, 400],
'units': [11, 12, 12, 13],
'some_other_data': ['a', 'b', 'c', 'd'],
}
df2 = {
'vouchers': [500, 200, 600, 300],
'units': [11, 12, 12, 13],
'some_other_data': ['b', 'd', 'c', 'a'],
}
Given the two dataframes like above, I want to do the following: if voucher from df1
can be found in df2
, and their corresponding unit is the same, then delete the entire voucher row from df1
.
So in this case the desired output would be:
df1 = {
'vouchers': [100, 300, 400],
'units': [11, 12, 13],
'some_other_data': ['a', 'c', 'd'],
}
What would be the best way to achieve this?
Upvotes: 4
Views: 1404
Reputation: 8816
Though we have many good answers, but the questions seems interesting so as the learning hence, i admit it in a great interest and would like to place another version which looks little simpler by using the booleans expression:
First DataFrame:
>>> df1
vouchers units some_other_data
0 100 11 a
1 200 12 b
2 300 12 c
3 400 13 d
Second DataFrame:
>>> df2
vouchers units some_other_data
0 500 11 a
1 200 12 b
2 600 12 c
3 300 13 d
Possible Simpler answer:
>>> df1[(df1 != df2).any(1)]
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Solution 2: Using merge
+ indicator
+ query
>>> df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Solution 3:
>>> df1[~df1.isin(df2).all(axis=1)]
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Upvotes: 2
Reputation: 129
Try this, it is simple:
excs = [] #will store the index of the values which are equal
for i, (key, value) in enumerate(zip(df1["vouchers"], df1["units"])):
for key2, value2 in zip(df2["vouchers"], df2["units"]):
if key == key2 and value == value2:
excs.append(i)
for exc in excs:
del(df1["vouchers"][exc])
del(df1["units"][exc])
Upvotes: 0
Reputation: 164613
One possibility via pd.DataFrame.duplicated
:
df = pd.concat([df1, df2], ignore_index=True)
df = df.loc[~df.duplicated(subset=['vouchers', 'units'], keep=False)]
df = df.reindex(df.index & df1.index)
print(df)
# some_other_data units vouchers
# 0 a 11 100
# 2 c 12 300
# 3 d 13 400
Upvotes: 1
Reputation: 402263
You can do this efficiently with index operations, using pd.Index.isin
:
u = df1.set_index(['vouchers', 'units'])
df1[~u.index.isin(pd.MultiIndex.from_arrays([df2.vouchers, df2.units]))]
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Upvotes: 5
Reputation: 161
My solution:
df1 = {
'vouchers': [100, 200, 300, 400],
'units': [11, 12, 12, 13],
'some_other_data': ['a', 'b', 'c', 'd']
}
df2 = {
'vouchers': [500, 200, 600, 300],
'units': [11, 12, 12, 13],
'some_other_data': ['a', 'b', 'c', 'd']
}
y = 0
for x in range(len(df1['vouchers'])):
if df1['vouchers'][x-y] == df2['vouchers'][x]:
if df1['units'][x-y] == df2['units'][x]:
for key in df1.keys():
del df1[key][x]
y += 1
Upvotes: 0
Reputation: 323226
Doing with merge
indicator
, after we get the index
need to remove , using drop
idx=df1.merge(df2,on=['vouchers','units'],indicator=True,how='left').\
loc[lambda x : x['_merge']=='both'].index
df1=df1.drop(idx,axis=0)
df1
Out[374]:
vouchers units some_other_data
0 100 11 a
2 300 12 c
3 400 13 d
Upvotes: 3