user9092346
user9092346

Reputation:

Pandas summarizing data into string using groupby

I have two dataframes. The first is longitudinal and has three columns 'Customer', 'Product' and 'Category'. It has one entry per item that a customer buys.

The second should contain one row per Customer and one column per Category. In each of these columns it should give all Products this Customer bought from the respective category as a single string separated by blanks.

df_1 = {'Customer': [1, 1, 1, 1, 2, 2, 2, 2],
        'Product': ['Apple', 'Chocolate', 'Orange', 'Cookie', 'Banana', 'Apple', 'Cookie', 'Apple'],
        'Category': ['Fruit', 'Sweets', 'Fruit', 'Sweets', 'Fruit', 'Fruit', 'Sweets', 'Fruit']}

The target DataFrame should look like this:

df_2 = {'Customer': [1, 2],
        'Fruits': ['Apple Orange', 'Banana Apple Apple'],
        'Sweets': ['Chocolate Cookie', 'Cookie']}

How to best approach this.

Upvotes: 0

Views: 48

Answers (2)

Riccardo Bucco
Riccardo Bucco

Reputation: 15364

Here's an option:

df_1.groupby(['Customer', 'Category'])['Product'].apply(' '.join).unstack()

Upvotes: 0

Nk03
Nk03

Reputation: 14949

Use pivot_table with aggfunc=' '.join:

df_2 = (
    (df_1.pivot_table(
        index='Customer',
        columns='Category',
        values='Product',
        aggfunc=' '.join)
     )
    .rename_axis(columns=None)
    .reset_index()
)

OUTPUT:

   Customer               Fruit            Sweets
0         1        Apple Orange  Chocolate Cookie
1         2  Banana Apple Apple            Cookie

Upvotes: 0

Related Questions