Reputation: 55
Having below data set:
data_input:
A B
1 C13D C07H
2 C07H C13D
3 B42C B65H
4 B65H B42C
5 A45B A47C
i.e. row 1 and row 2 in data_input
are same,I just want to keep one,so drop row 2.
Want the Output as below:
data_output:
A B
1 C13D C07H
2 B42C B65H
3 A45B A47C
Upvotes: 4
Views: 190
Reputation: 76917
You could use duplicated
and np.sort
In [1279]: df[~df.apply(np.sort, axis=1).duplicated()]
Out[1279]:
A B
1 C13D C07H
3 B42C B65H
5 A45B A47C
Details
In [1281]: df.apply(np.sort, axis=1)
Out[1281]:
A B
1 C07H C13D
2 C07H C13D
3 B42C B65H
4 B42C B65H
5 A45B A47C
In [1282]: df.apply(np.sort, axis=1).duplicated()
Out[1282]:
1 False
2 True
3 False
4 True
5 False
dtype: bool
Upvotes: 0
Reputation: 656
You can create a third column 'C'
based on 'A'
and 'B'
and use it to find duplicates as such:
df['C'] = df['A'] + df['B']
df['C'] = df['C'].apply(lambda x: ''.join(sorted(x)))
df = df.drop_duplicates(subset='C')[['A', 'B']]
Upvotes: 8