Reputation: 397
Lets assume a df1
df1 = pd.DataFrame(
{'col1': {0: 500.0, 1: 500.0, 2: 833.3, 3: 500.0, 4: 833.3, 5: 500.0, 6: 833.3},
'col2': {0: 1833.3, 1: 1000.0, 2: 1833.3, 3: 2666.7, 4: 1833.3, 5: 3500.0, 6: 1000.0},
'col3': {0: 250.0, 1: 250.0, 2: 30.0, 3: 30.0, 4: 30.0, 5: 103.3, 6: 176.7},
'col4': {0: 3.4, 1: 4.0, 2: 2.2, 3: 3.4, 4: 2.2, 5: 4.0, 6: 3.4},
'col5': {0: 0.25, 1: 0.15, 2: 0.1, 3: 0.25, 4: 0.25, 5: 0.1, 6: 0.1},
'col6': {0: 364, 1: 937, 2: 579, 3: 313, 4: 600, 5: 49, 6: 13}})
And a df2
df2 = pd.DataFrame(
{'col1': {0: 833.3, 1: 500.0, 2: 500.0, 3: 500.0, 4: 500.0, 5: 500.0, 6: 500.0, 7: 833.3, 8: 500.0, 9: 833.3, 10: 500.0, 11: 500.0, 12: 833.3, 13: 833.3, 14: 833.3},
'col2': {0: 1833.3, 1: 1000.0, 2: 1833.3, 3: 3500.0, 4: 3500.0, 5: 1000.0, 6: 2666.7, 7: 1833.3, 8: 2666.7, 9: 1000.0, 10: 2666.7, 11: 2666.7, 12: 1000.0, 13: 1833.3, 14: 1833.3},
'col3': {0: 30.0, 1: 250.0, 2: 250.0, 3: 103.3, 4: 176.7, 5: 103.3, 6: 30.0, 7: 103.3, 8: 30.0, 9: 176.7, 10: 250.0, 11: 103.3, 12: 30.0, 13: 30.0, 14: 250.0},
'col4': {0: 2.2, 1: 4.0, 2: 3.4, 3: 4.0, 4: 2.2, 5: 2.8, 6: 2.8, 7: 2.8, 8: 3.4, 9: 3.4, 10: 2.8, 11: 2.8, 12: 3.4, 13: 2.2, 14: 2.8},
'col5': {0: 0.25, 1: 0.15, 2: 0.25, 3: 0.1, 4: 0.2, 5: 0.15, 6: 0.15, 7: 0.25, 8: 0.25, 9: 0.1, 10: 0.15, 11: 0.1, 12: 0.15, 13: 0.1, 14: 0.2}})
What is the most pythonic way to drop the rows in df2 where col1 and col2 and col3 and col4 (and coln for that matter) have the same values as the respective columns of df1? I don't want to merge the dataframes, only drop any rows in df2 (might be multiple) where the row tuple over the interest columns is the same in both dfs.
I have only come up with how to do this using :
new_df = df2.loc[df2[col1].isin(df1[col1]) &
df2[col2].isin(df1[col2]) &
df2[col3].isin(df1[col3]) &
df2[col4].isin(df1[col4]) &
df2[col5].isin(df1[col5]) ]
which is a bit cumbersome for larger datasets and for more columns.
Any ideas for a better way?
Upvotes: 1
Views: 136
Reputation: 164673
You can use set_index
combined with pd.Index.difference
to extract your result:
idx = df2.set_index(list(df2)).index\
.difference(df1.set_index(list(df1)).index)
res = df2.set_index(list(df2)).loc[idx].reset_index()
The benefit of this method is it doesn't require numeric to string conversion.
Upvotes: 1
Reputation: 323236
You can using isin
,before doing that we need create a key using all value from col1~ coln(convert to str and paste together)
df2[~df2[df1.columns].astype(str).sum(1).isin(df1.astype(str).sum(1))]
Upvotes: 2