how to select rows in a data frame those are similar on basis of column values

Question

col1 col2 col3 col4
'abc' 2    3    4
'asd' 4    5    6
'dfg' 7    5    6
'ghg' 2    3    4
'xyz' 1    3    4

Here I want to find the rows( specifically list of 'col1' values) those are similar on the basis of 'col3' and 'col4' Output :

[[asd,dfg],[abc,ghg,xyz]]

because here both asd and dfg have similar 'col3' and 'col4' values 3 and 4 respectively

Ch3steR · Accepted Answer

You can use df.groupby here.

df.groupby('col3').col1.apply(list).tolist()
# [['abc', 'ghg', 'xyz'], ['asd', 'dfg']]

Answers (2)