Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Merge df with itself grouped

I have a df

id         qty      location
1          1        Paris
1          10       Rome
1          50       London
2          100      Paris
2          40       Oslo
3          2        Paris
3          9        Washington
3          10       London

I am trying to get the total sum of qty and sum for Paris, Rome and other location so that the df would look like this:

id      qty_total       qty_Paris       qty_Rome        qty_other
1       61              1               10              50
2       140             100             0               40
3       21              2               0               19

What I tried:

df_temp = df.groupby('id')['qty'].sum()
df_temp['qty_Paris'] = df[df.groupby(['id', 'location'])[['qty']]
                       .sum().reset_index()['location']=='Paris']

But I get an error:

Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

How should I approach this problem so that I would get the df I desire?

Upvotes: 1

Views: 54

Answers (1)

Nk03
Nk03

Reputation: 14949

One way:

df.loc[~df.location.isin(['Paris', 'Rome']), 'location'] = 'Others'
df1 = df.pivot_table(index='id', columns='location', values='qty',
                     aggfunc=sum, fill_value=0,).add_prefix('qty_').rename_axis(columns=None)
df1['qty_total'] = df1.sum(1)
OUTPUT:
    qty_Others  qty_Paris  qty_Rome  qty_total
id                                            
1           50          1        10         61
2           40        100         0        140
3           19          2         0         21

Complete Example:

df = pd.DataFrame({'id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3, 6: 3, 7: 3},
                   'qty': {0: 1, 1: 10, 2: 50, 3: 100, 4: 40, 5: 2, 6: 9, 7: 10},
                   'location': {0: 'Paris',
                                1: 'Rome',
                                2: 'London',
                                3: 'Paris',
                                4: 'Oslo',
                                5: 'Paris',
                                6: 'Washington',
                                7: 'London'}})
df.loc[~df.location.isin(['Paris', 'Rome']), 'location'] = 'Others'
df1 = df.pivot_table(index='id', columns='location', values='qty',
                     aggfunc=sum, fill_value=0,).add_prefix('qty_').rename_axis(columns=None)
df1['qty_total'] = df1.sum(1)

Upvotes: 4

Related Questions