Reputation: 10040
I have a dataframe with two columns, with string entries, eg:
Col1 Col2
E1 K1
E2 K2
E3 K3
...
I want to create an index on Col2 but the entries there are not unique. So I would like to merge all rows of shared entries in Col2. Say K2, K3 are identical. Then after the merge what I would like to have is the Col1 values serialized into a list forming just one row, ie:
Col1 Col2
E1 K1
[E2, E3] K2
...
What would be the cleanest way of achieving this?
Upvotes: 0
Views: 61
Reputation: 9946
since you don't know a priori what the dupes are, you can do:
df.groupby('Col2').agg(lambda s: list(s))
for some reason .agg(list)
doesn't work for me...
Upvotes: 0
Reputation: 323226
You can using mapper
with groupby
df.set_index('Col2').groupby(by={'K1':'K1','K2':'K2','K3':'K2'},axis=0).Col1.apply(list)
Out[116]:
K1 [E1]
K2 [E2, E3]
Name: Col1, dtype: object
Upvotes: 1
Reputation: 79188
First change K3
into K2
then groupby column2 and aggregate on it
df[df=='K3']='K2'
df.groupby('Col2').agg(list)
Col1
Col2
K1 [E1]
K2 [E2, E3]
Upvotes: 1