Conditional selection on Python pandas

Question

Suppose I have this dataframe (call it df):

Here's what I want to do with the dataframe: 1. Select the rows that match with Col1 and Col2, if there are two rows for each id. 2. If there's only one row for the id, then select the row, even if the Col1 and Col2 do not match.

df = df[df['Col1'] == df['Col2']]

This code is wrong, because it doesn't satisfy the requirement 2 above. This is the result I want:

I would really appreciate it if someone could explain to me how to accomplish this! Thank you.

jezrael · Accepted Answer

Assuming there are only unique and duplicated values with length 2 in id column.

Then use duplicated for select all duplicates with ~ for inverse mask - select all unique rows:

m1 = df['Col1'] == df['Col2']
m2 = df['id'].duplicated(keep=False)
df = df[(m1 & m2) | ~m2]
print (df)
     Col1   Col2  Col3 id
0   Pizza  Pizza   100  1
3   Pizza  Pizza   300  2
4   Ramen  Ramen   230  3
6   Ramen  Pizza    13  4
8   Pizza  Pizza    13  5
10  Ramen  Ramen    30  6
11  Pizza  Ramen    45  7

Conditional selection on Python pandas

Answers (1)

Related Questions