crocefisso
crocefisso

Reputation: 903

How to aggregate string values with common keys in a Pandas dataframe?

Here is a minimal example of the input Pandas Dataframe:

  |-----+---------|
  | key | Value   |
  |-----+---------|
  |  A  | alpha   |
  |  B  | beta    |
  |  B  | gamma   |
  |  B  | delta   |
  |  C  | delta   |
  |  D  | delta   |
  |  D  | epsilon |
  |-----+---------|

Here is the output I'd like to generate with Pandas

  |-----+------------------|
  | key | Value            |
  |-----+------------------|
  |  A  | alpha            |
  |  B  | beta gamma delta |
  |  B  | beta gamma delta |
  |  B  | beta gamma delta |
  |  C  | delta            |
  |  D  | delta epsilon    |
  |  D  | delta epsilon    |
  |-----+------------------|

Upvotes: 0

Views: 107

Answers (1)

Anurag Dabas
Anurag Dabas

Reputation: 24324

try via groupby() and transform():

df['Value']=df.groupby('key')['Value'].transform(' '.join)

OR

via groupby(),agg() and map():

df['Value']=df['key'].map(df.groupby('key')['Value'].agg(set).str.join(' '))

output of df:

   key  Value
0   A   alpha
1   B   beta gamma delta
2   B   beta gamma delta
3   B   beta gamma delta
4   C   delta
5   D   delta epsilon
6   D   delta epsilon

Update:

use drop_duplicates() to removes duplicated values:

out=df.drop_duplicates(subset=['key','Value']).groupby('key')['Value'].transform(' '.join)

Upvotes: 2

Related Questions