Reputation: 197
col1 col2 col3 col4
'abc' 2 3 4
'asd' 4 5 6
'dfg' 7 5 6
'ghg' 2 3 4
'xyz' 1 3 4
Here I want to find the rows( specifically list of 'col1' values) those are similar on the basis of 'col3' and 'col4' Output :
[[asd,dfg],[abc,ghg,xyz]]
because here both asd and dfg have similar 'col3' and 'col4' values 3 and 4 respectively
Upvotes: 0
Views: 31
Reputation: 20669
You can use df.groupby
here.
df.groupby('col3').col1.apply(list).tolist()
# [['abc', 'ghg', 'xyz'], ['asd', 'dfg']]
Upvotes: 1
Reputation: 1267
Something like this might work -
df['col1'] = df['col1'].str.replace('\'','')
df.groupby(['col3'])['col1'].apply(list).reset_index()['col1'].tolist()
[['abc', 'ghg', 'xyz'], ['asd', 'dfg']]
Upvotes: 1