Iyad Al aqel
Iyad Al aqel

Reputation: 2022

Remove duplicates based on the content of two columns not the order

I have a correlation matrix that i melted into a dataframe so now i have the following for example:

First      Second       Value
A          B            0.5
B          A            0.5
A          C            0.2 

i want to delete only one of the first two rows. What would be the way to do it?

Upvotes: 0

Views: 1483

Answers (3)

rnso
rnso

Reputation: 24535

One can also use following approach:

# create a new column after merging and sorting 'First' and 'Second':
df['newcol']=df.apply(lambda x: "".join(sorted(x[0]+x[1])), axis=1)
print(df)

  First Second  Value newcol
0     A      B    0.5     AB
1     B      A    0.5     AB
2     A      C    0.2     AC

# get its non-duplicated indexes and remove the new column: 
df = df[~df.newcol.duplicated()].iloc[:,:3]
print(df)

  First Second  Value
0     A      B    0.5
2     A      C    0.2

Upvotes: 0

cs95
cs95

Reputation: 402383

You could call drop_duplicates on the np.sorted columns:

df = df.loc[~pd.DataFrame(np.sort(df.iloc[:, :2])).duplicated()]
df

  First Second  Value
0     A      B    0.5
2     A      C    0.2

Details

np.sort(df.iloc[:, :2])

array([['A', 'B'],
       ['A', 'B'],
       ['A', 'C']], dtype=object)

~pd.DataFrame(np.sort(df.iloc[:, :2], axis=1)).duplicated()

0     True
1    False
2     True
dtype: bool

Sort the columns and figure out which ones are duplicates. The mask will then be used to filter out the dataframe via boolean indexing.

To reset the index, use reset_index:

df.reset_index(drop=1)

  First Second  Value
0     A      B    0.5
1     A      C    0.2

Upvotes: 1

jezrael
jezrael

Reputation: 862581

Use:

#if want select columns by columns names
m = ~pd.DataFrame(np.sort(df[['First','Second']], axis=1)).duplicated()
#if want select columns by positons
#m = ~pd.DataFrame(np.sort(df.iloc[:,:2], axis=1)).duplicated()
print (m)

0     True
1    False
2     True
dtype: bool

df = df[m]
print (df)
  First Second  Value
0     A      B    0.5
2     A      C    0.2

Upvotes: 1

Related Questions