Pandas transform('unique') output as comma separate string instead of list

I have a DataFrame that looks like this:

df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})

    ID  Name
0   1   John Doe
1   1   Jane Doe
2   2   John Smith
3   2   Jane Smith
4   3   Jack Hill
5   4   Jill Hill

I then added another column grouping by ID and taking the unique values in Name:

df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')

    ID  Name    Multi Name
0   1   John Doe    [John Doe, Jane Doe]
1   1   Jane Doe    [John Doe, Jane Doe]
2   2   John Smith  [John Smith, Jane Smith]
3   2   Jane Smith  [John Smith, Jane Smith]
4   3   Jack Hill   [Jack Hill]
5   4   Jill Hill   [Jill Hill]

How do I remove the Brackets from Multi Name?

I have tried:

df['Multi Name'] = df['Multi Name'].str.strip('[]')


ID  Name    Multi Name
0   1   John Doe    NaN
1   1   Jane Doe    NaN
2   2   John Smith  NaN
3   2   Jane Smith  NaN
4   3   Jack Hill   NaN
5   4   Jill Hill   NaN

Desired output:

    ID  Name    Multi Name
0   1   John Doe    John Doe, Jane Doe
1   1   Jane Doe    John Doe, Jane Doe
2   2   John Smith  John Smith, Jane Smith
3   2   Jane Smith  John Smith, Jane Smith
4   3   Jack Hill   Jack Hill
5   4   Jill Hill   Jill Hill

Upvotes: 2

Answers (3)

piRSquared

Reputation: 294488

`transform`

df.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))

   ID        Name                Multi Name
0   1    John Doe      [John Doe, Jane Doe]
1   1    Jane Doe      [John Doe, Jane Doe]
2   2  John Smith  [John Smith, Jane Smith]
3   2  Jane Smith  [John Smith, Jane Smith]
4   3   Jack Hill               [Jack Hill]
5   4   Jill Hill               [Jill Hill]

df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  John Smith, Jane Smith
3   2  Jane Smith  John Smith, Jane Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

`map`

df.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  John Smith, Jane Smith
3   2  Jane Smith  John Smith, Jane Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

`itertools.groupby`

from itertools import groupby

d = {
    k: ', '.join(x[1] for x in v)
    for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}

df.join(df.ID.map(d).rename('Multi Name'))

   ID        Name              Multi Name
0   1    John Doe      Jane Doe, John Doe
1   1    Jane Doe      Jane Doe, John Doe
2   2  John Smith  Jane Smith, John Smith
3   2  Jane Smith  Jane Smith, John Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

Upvotes: 5

Scott Boston

Reputation: 153500

Use map and join:

df['Multi Name'] = df.groupby('ID')['Name'].transform('unique').map(', '.join)

Output:

   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  John Smith, Jane Smith
3   2  Jane Smith  John Smith, Jane Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

Upvotes: 3

cs95

Reputation: 402814

Looks like unique is the wrong choice of function here. I'd recommend a custom lambda function using str.join:

df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))

df
   ID        Name              Multi Name
0   1    John Doe      John Doe, Jane Doe
1   1    Jane Doe      John Doe, Jane Doe
2   2  John Smith  Jane Smith, John Smith
3   2  Jane Smith  Jane Smith, John Smith
4   3   Jack Hill               Jack Hill
5   4   Jill Hill               Jill Hill

Upvotes: 5

Pandas transform(&#39;unique&#39;) output as comma separate string instead of list

Answers (3)

transform

map

itertools.groupby

Related Questions

Pandas transform('unique') output as comma separate string instead of list

`transform`

`map`

`itertools.groupby`