Reputation: 3485
Suppose I have this dataframe (call it df):
Here's what I want to do with the dataframe: 1. Select the rows that match with Col1 and Col2, if there are two rows for each id. 2. If there's only one row for the id, then select the row, even if the Col1 and Col2 do not match.
df = df[df['Col1'] == df['Col2']]
This code is wrong, because it doesn't satisfy the requirement 2 above. This is the result I want:
I would really appreciate it if someone could explain to me how to accomplish this! Thank you.
Upvotes: 1
Views: 1227
Reputation: 862681
Assuming there are only unique and duplicated values with length 2 in id
column.
Then use duplicated
for select all duplicates with ~
for inverse mask - select all unique rows:
m1 = df['Col1'] == df['Col2']
m2 = df['id'].duplicated(keep=False)
df = df[(m1 & m2) | ~m2]
print (df)
Col1 Col2 Col3 id
0 Pizza Pizza 100 1
3 Pizza Pizza 300 2
4 Ramen Ramen 230 3
6 Ramen Pizza 13 4
8 Pizza Pizza 13 5
10 Ramen Ramen 30 6
11 Pizza Ramen 45 7
Upvotes: 2