Reputation: 12172
I am most interested in how this is done in a good and excellent pandas way.
In this example data Tim
from Osaka
has two fruit
's.
import pandas as pd
data = {'name': ['Susan', 'Tim', 'Tim', 'Anna'],
'fruit': ['Apple', 'Apple', 'Banana', 'Banana'],
'town': ['Berlin', 'Osaka', 'Osaka', 'Singabpur']}
df = pd.DataFrame(data)
print(df)
Result
name fruit town
0 Susan Apple Berlin
1 Tim Apple Osaka
2 Tim Banana Osaka
3 Anna Banana Singabpur
I investigate the data ans see that one of the persons have multiple fruits. I want to create a new "category" for it named banana&fruit
(or something else). The point is that the other fields of Tim
are equal in their values.
df.groupby(['name', 'town', 'fruit']).size()
I am not sure if this is the correct way to explore this data set. The logical question behind is if some of the person+town combinations have multiple fruits.
As a result I want this
name fruit town
0 Susan Apple Berlin
1 Tim Apple&Banana Osaka
2 Anna Banana Singabpur
Upvotes: 2
Views: 463
Reputation: 35646
Use groupby agg
:
new_df = (
df.groupby(['name', 'town'], as_index=False, sort=False)
.agg(fruit=('fruit', '&'.join))
)
new_df
:
name town fruit
0 Susan Berlin Apple
1 Tim Osaka Apple&Banana
2 Anna Singabpur Banana
Upvotes: 3
Reputation: 120439
>>> df.groupby(["name", "town"], sort=False)["fruit"]
.apply(lambda f: "&".join(f)).reset_index()
name town fruit
0 Anna Singabpur Banana
1 Susan Berlin Apple
2 Tim Osaka Apple&Banana
Upvotes: 2