Reputation: 1288
I'm trying to compare two dataframes, and then delete the matches from both.
I figured tempSheet = tempSheet[tempSheet != testdf]
would work, but I get an error saying
ValueError: Can only compared identically-labeled DataFrame objects
The column names are the same, so I'm guessing it's just impossible to do it that way.
Is there an obvious syntax error I have? Is there a way to use pd.merge
to return the ones that are not matched?
My dataframes look like this:
Qty Price
0 1 1.30
1 6 2.70
2 8 0.20
3 10 3.90
4 9 11.25
5 15 1.89
6 26 2.67
7 200 7.65
...
Qty Price
0 1 1.30
1 10 3.90
2 15 1.89
3 16 0.98
4 2 10.52
5 66 9.87
6 9 13.42
7 43 27.65
...
I want to cut the first down to only the matches, so
Qty Price
0 6 2.70
1 8 0.20
2 9 11.25
3 26 2.67
...
I would then do the same thing to the second.
Upvotes: 2
Views: 65
Reputation: 77951
This will give you the indices of matches:
>>> hit = df1.reset_index().merge(df2.reset_index(),
... on=['Qty', 'Price'], how='inner', suffixes=('-1', '-2'))
>>> hit
index-1 Qty Price index-2
0 0 1 1.30 0
1 3 10 3.90 1
2 5 15 1.89 2
[3 rows x 4 columns]
if you want to remove matches, just drop index-1
from df1
and index-2
from df2
.
>>> df1[~df1.index.isin(hit['index-1'])] # or, df1.loc[df1.index - hit['index-1']]
Qty Price
1 6 2.70
2 8 0.20
4 9 11.25
6 26 2.67
7 200 7.65
[5 rows x 2 columns]
>>> df2[~df2.index.isin(hit['index-2'])] # or, df2.loc[df2.index - hit['index-2']]
Qty Price
3 16 0.98
4 2 10.52
5 66 9.87
6 9 13.42
7 43 27.65
[5 rows x 2 columns]
Upvotes: 2