Reputation: 14113
I have a DataFrame that looks like this:
df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})
ID Name
0 1 John Doe
1 1 Jane Doe
2 2 John Smith
3 2 Jane Smith
4 3 Jack Hill
5 4 Jill Hill
I then added another column grouping by ID and taking the unique values in Name:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
How do I remove the Brackets from Multi Name?
I have tried:
df['Multi Name'] = df['Multi Name'].str.strip('[]')
ID Name Multi Name
0 1 John Doe NaN
1 1 Jane Doe NaN
2 2 John Smith NaN
3 2 Jane Smith NaN
4 3 Jack Hill NaN
5 4 Jill Hill NaN
Desired output:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Upvotes: 2
Views: 1947
Reputation: 294488
transform
df.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
map
df.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
itertools.groupby
from itertools import groupby
d = {
k: ', '.join(x[1] for x in v)
for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}
df.join(df.ID.map(d).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe Jane Doe, John Doe
1 1 Jane Doe Jane Doe, John Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Upvotes: 5
Reputation: 153500
Use map
and join
:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique').map(', '.join)
Output:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Upvotes: 3
Reputation: 402814
Looks like unique
is the wrong choice of function here. I'd recommend a custom lambda function using str.join
:
df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))
df
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Upvotes: 5