JungleDiff
JungleDiff

Reputation: 3485

Conditional selection on Python pandas

Suppose I have this dataframe (call it df): enter image description here

Here's what I want to do with the dataframe: 1. Select the rows that match with Col1 and Col2, if there are two rows for each id. 2. If there's only one row for the id, then select the row, even if the Col1 and Col2 do not match.

df = df[df['Col1'] == df['Col2']]

This code is wrong, because it doesn't satisfy the requirement 2 above. This is the result I want:enter image description here

I would really appreciate it if someone could explain to me how to accomplish this! Thank you.

Upvotes: 1

Views: 1227

Answers (1)

jezrael
jezrael

Reputation: 862681

Assuming there are only unique and duplicated values with length 2 in id column.

Then use duplicated for select all duplicates with ~ for inverse mask - select all unique rows:

m1 = df['Col1'] == df['Col2']
m2 = df['id'].duplicated(keep=False)
df = df[(m1 & m2) | ~m2]
print (df)
     Col1   Col2  Col3 id
0   Pizza  Pizza   100  1
3   Pizza  Pizza   300  2
4   Ramen  Ramen   230  3
6   Ramen  Pizza    13  4
8   Pizza  Pizza    13  5
10  Ramen  Ramen    30  6
11  Pizza  Ramen    45  7

Upvotes: 2

Related Questions