Reputation: 877
DataFrame:
column1 column2
0 some_data string1
1 some_data string1
2 some_data string2
3 some_data string3
4 some_data string2
5 some_data string4
5 some_data string4
...
20k+ rows in total
Explanation: For most rows, column2 data appear in pairs. I want to find out rows that do not have paired data (e.g. string3)
Expected Output:
column1 column2
0 some_data string3
Any solutions to find out such rows? thanks!
Upvotes: 0
Views: 44
Reputation: 261000
Here is an alternative to take only a pair number of rows, working as a pipeline:
(df.groupby('column2', as_index=False)
.apply(lambda d: d.iloc[:2*(len(d)//2)]) # transform any odd to previous even
.droplevel(0)
)
output:
column1 column2
0 some_data string1
1 some_data string1
2 some_data string2
4 some_data string2
5 some_data string4
5 some_data string4
Upvotes: 0
Reputation: 862901
If possible simplify problem for found all rows without dupes by column2
use:
df1 = df[~df['column2'].duplicated(keep=False)]
If need test counts and filter all rows without pairs (2):
df2 = df[df.groupby('column2')['column2'].transform('size').ne(2)]
Also if need test all pairs, it means 2, 4, 6, 8
... use:
df3 = df[df.groupby('column2')['column2'].transform('size') % 2 == 1]
Upvotes: 1