PointXIV
PointXIV

Reputation: 1288

Delete matches from dataframes

I'm trying to compare two dataframes, and then delete the matches from both.

I figured tempSheet = tempSheet[tempSheet != testdf] would work, but I get an error saying

ValueError: Can only compared identically-labeled DataFrame objects

The column names are the same, so I'm guessing it's just impossible to do it that way.

Is there an obvious syntax error I have? Is there a way to use pd.merge to return the ones that are not matched?

My dataframes look like this:

   Qty    Price     
0  1      1.30
1  6      2.70
2  8      0.20
3  10     3.90
4  9      11.25
5  15     1.89
6  26     2.67
7  200    7.65
...

   Qty    Price
0  1      1.30
1  10     3.90
2  15     1.89
3  16     0.98
4  2      10.52
5  66     9.87
6  9      13.42
7  43     27.65
...

I want to cut the first down to only the matches, so

    Qty    Price
0   6      2.70
1   8      0.20
2   9      11.25
3   26     2.67
...

I would then do the same thing to the second.

Upvotes: 2

Views: 65

Answers (1)

behzad.nouri
behzad.nouri

Reputation: 77951

This will give you the indices of matches:

>>> hit = df1.reset_index().merge(df2.reset_index(),
...     on=['Qty', 'Price'], how='inner', suffixes=('-1', '-2'))
>>> hit
   index-1  Qty  Price  index-2
0        0    1   1.30        0
1        3   10   3.90        1
2        5   15   1.89        2

[3 rows x 4 columns]

if you want to remove matches, just drop index-1 from df1 and index-2 from df2.

>>> df1[~df1.index.isin(hit['index-1'])]  # or, df1.loc[df1.index - hit['index-1']]
   Qty  Price
1    6   2.70
2    8   0.20
4    9  11.25
6   26   2.67
7  200   7.65

[5 rows x 2 columns]
>>> df2[~df2.index.isin(hit['index-2'])]  # or, df2.loc[df2.index - hit['index-2']]
   Qty  Price
3   16   0.98
4    2  10.52
5   66   9.87
6    9  13.42
7   43  27.65

[5 rows x 2 columns]

Upvotes: 2

Related Questions