Panda Dataframe Find rows which does not have equivalent value in the DataFrame

Question

DataFrame:

    column1     column2
0   some_data   string1
1   some_data   string1
2   some_data   string2
3   some_data   string3
4   some_data   string2
5   some_data   string4
5   some_data   string4
...
20k+ rows in total

Explanation: For most rows, column2 data appear in pairs. I want to find out rows that do not have paired data (e.g. string3)

Expected Output:

   column1    column2
0   some_data  string3

Any solutions to find out such rows? thanks!

jezrael · Accepted Answer

If possible simplify problem for found all rows without dupes by column2 use:

df1 = df[~df['column2'].duplicated(keep=False)]

If need test counts and filter all rows without pairs (2):

df2 = df[df.groupby('column2')['column2'].transform('size').ne(2)]

Also if need test all pairs, it means 2, 4, 6, 8... use:

df3 = df[df.groupby('column2')['column2'].transform('size') % 2 == 1]

Panda Dataframe Find rows which does not have equivalent value in the DataFrame

Answers (2)

Related Questions