Kaiser
Kaiser

Reputation: 197

how to select rows in a data frame those are similar on basis of column values

My Dataframe is :

col1 col2 col3 col4
'abc' 2    3    4
'asd' 4    5    6
'dfg' 7    5    6
'ghg' 2    3    4
'xyz' 1    3    4

Here I want to find the rows( specifically list of 'col1' values) those are similar on the basis of 'col3' and 'col4' Output :

[[asd,dfg],[abc,ghg,xyz]]

because here both asd and dfg have similar 'col3' and 'col4' values 3 and 4 respectively

Upvotes: 0

Views: 31

Answers (2)

Ch3steR
Ch3steR

Reputation: 20669

You can use df.groupby here.

df.groupby('col3').col1.apply(list).tolist()
# [['abc', 'ghg', 'xyz'], ['asd', 'dfg']]

Upvotes: 1

Sajan
Sajan

Reputation: 1267

Something like this might work -

df['col1'] = df['col1'].str.replace('\'','')
df.groupby(['col3'])['col1'].apply(list).reset_index()['col1'].tolist()
[['abc', 'ghg', 'xyz'], ['asd', 'dfg']]

Upvotes: 1

Related Questions