Combine two rows in Pandas depending on values in a column and creating a new category

Question

I am most interested in how this is done in a good and excellent pandas way.

In this example data Tim from Osaka has two fruit's.

import pandas as pd

data = {'name': ['Susan', 'Tim', 'Tim', 'Anna'],
        'fruit': ['Apple', 'Apple', 'Banana', 'Banana'],
        'town': ['Berlin', 'Osaka', 'Osaka', 'Singabpur']}

df = pd.DataFrame(data)
print(df)

Result

    name   fruit       town
0  Susan   Apple     Berlin
1    Tim   Apple      Osaka
2    Tim  Banana      Osaka
3   Anna  Banana  Singabpur

I investigate the data ans see that one of the persons have multiple fruits. I want to create a new "category" for it named banana&fruit (or something else). The point is that the other fields of Tim are equal in their values.

df.groupby(['name', 'town', 'fruit']).size()

I am not sure if this is the correct way to explore this data set. The logical question behind is if some of the person+town combinations have multiple fruits.

As a result I want this

    name   fruit             town
0  Susan   Apple             Berlin
1    Tim   Apple&Banana      Osaka
2   Anna   Banana            Singabpur

Henry Ecker · Accepted Answer

Use groupby agg:

new_df = (
    df.groupby(['name', 'town'], as_index=False, sort=False)
        .agg(fruit=('fruit', '&'.join))
)

new_df:

    name       town         fruit
0  Susan     Berlin         Apple
1    Tim      Osaka  Apple&Banana
2   Anna  Singabpur        Banana

Combine two rows in Pandas depending on values in a column and creating a new category

Answers (2)

Related Questions