gen
gen

Reputation: 10040

How to merge rows in a pandas dataframe that share the same designated index entry?

I have a dataframe with two columns, with string entries, eg:

Col1 Col2
E1     K1
E2     K2
E3     K3
...

I want to create an index on Col2 but the entries there are not unique. So I would like to merge all rows of shared entries in Col2. Say K2, K3 are identical. Then after the merge what I would like to have is the Col1 values serialized into a list forming just one row, ie:

Col1 Col2
E1     K1
[E2, E3]     K2
...

What would be the cleanest way of achieving this?

Upvotes: 0

Views: 61

Answers (3)

acushner
acushner

Reputation: 9946

since you don't know a priori what the dupes are, you can do:

df.groupby('Col2').agg(lambda s: list(s))

for some reason .agg(list) doesn't work for me...

Upvotes: 0

BENY
BENY

Reputation: 323226

You can using mapper with groupby

df.set_index('Col2').groupby(by={'K1':'K1','K2':'K2','K3':'K2'},axis=0).Col1.apply(list)
Out[116]: 
K1        [E1]
K2    [E2, E3]
Name: Col1, dtype: object

Upvotes: 1

Onyambu
Onyambu

Reputation: 79188

First change K3 into K2 then groupby column2 and aggregate on it

 df[df=='K3']='K2'
 df.groupby('Col2').agg(list)
          Col1
Col2          
K1        [E1]
K2    [E2, E3]

Upvotes: 1

Related Questions