Reputation: 1949
I have a DF that looks something like
c1 c2 c3
1 A B x
2 A C y
3 B A x
4 B D z
5 A B y
As you can see, lines 1 and 3 are repeated if we disregard that c1
and c2
are different columns (or if they become reversed). However, line 5 is not. How can I drop rows based on columns c1
and c2
, regardless of where the repeated values are?
Thanks in advance
Upvotes: 2
Views: 679
Reputation: 2472
Also it could be done by sorting the row values using sorted():
df[~df[['c1','c2']].apply(lambda row: sorted(row), axis = 1).duplicated()]
Upvotes: 0
Reputation: 863216
You can select columns by subset, sorting by numpy.sort
,create new DataFrame from array and use DataFrame.duplicated
with filtering by inverse condition by boolean indexing
:
df = df[~pd.DataFrame(np.sort(df[['c1','c2']], axis=1), index=df.index).duplicated()]
print (df)
c1 c2 c3
1 A B x
2 A C y
4 B D z
Or:
df = df[~pd.DataFrame(np.sort(df[['c1','c2']], axis=1)).duplicated().values]
Upvotes: 3
Reputation: 323326
Ok let us try something new frozenset, will order your column in to sorted tuple
, then using the duplicated
df[~df[['c1','c2']].apply(frozenset,axis=1).duplicated()]
Out[666]:
c1 c2 c3
1 A B x
2 A C y
4 B D z
Upvotes: 4