Reputation: 2022
I have a correlation matrix that i melted into a dataframe so now i have the following for example:
First Second Value
A B 0.5
B A 0.5
A C 0.2
i want to delete only one of the first two rows. What would be the way to do it?
Upvotes: 0
Views: 1483
Reputation: 24535
One can also use following approach:
# create a new column after merging and sorting 'First' and 'Second':
df['newcol']=df.apply(lambda x: "".join(sorted(x[0]+x[1])), axis=1)
print(df)
First Second Value newcol
0 A B 0.5 AB
1 B A 0.5 AB
2 A C 0.2 AC
# get its non-duplicated indexes and remove the new column:
df = df[~df.newcol.duplicated()].iloc[:,:3]
print(df)
First Second Value
0 A B 0.5
2 A C 0.2
Upvotes: 0
Reputation: 402383
You could call drop_duplicates
on the np.sort
ed columns:
df = df.loc[~pd.DataFrame(np.sort(df.iloc[:, :2])).duplicated()]
df
First Second Value
0 A B 0.5
2 A C 0.2
Details
np.sort(df.iloc[:, :2])
array([['A', 'B'],
['A', 'B'],
['A', 'C']], dtype=object)
~pd.DataFrame(np.sort(df.iloc[:, :2], axis=1)).duplicated()
0 True
1 False
2 True
dtype: bool
Sort the columns and figure out which ones are duplicates. The mask will then be used to filter out the dataframe via boolean indexing.
To reset the index, use reset_index
:
df.reset_index(drop=1)
First Second Value
0 A B 0.5
1 A C 0.2
Upvotes: 1
Reputation: 862581
Use:
#if want select columns by columns names
m = ~pd.DataFrame(np.sort(df[['First','Second']], axis=1)).duplicated()
#if want select columns by positons
#m = ~pd.DataFrame(np.sort(df.iloc[:,:2], axis=1)).duplicated()
print (m)
0 True
1 False
2 True
dtype: bool
df = df[m]
print (df)
First Second Value
0 A B 0.5
2 A C 0.2
Upvotes: 1