Changing value to be the maximum value per group

Question

I have this kind of structure:

country     product      installs   purchases
US          T            100        100
US          A            5          5
AU          T            500        500   
AU          A            20         20

I am trying to get:

country     product      installs   purchases
US          T            100        100
US          A            100        5
AU          T            500        500   
AU          A            500        20

Each value in the installs columns needs to be the value of installs where product column's value is T.

I tried:

exp.groupby(['country','product'])['date_install_'] = max(exp.groupby(['country','product'])['date_install_'])

Which does not work and I am kind of lost. How can I achieve the result?

Tom · Accepted Answer

Find the rows where the product is T, groupby the country, and get the maxiumum of the installs. Use this as a map to replace the values in installs:

df['installs'] = df['country'].map(df[df['product'] == 'T'].groupby('country')['installs'].max())

Result:

  country product  installs  purchases
0      US       T       100        100
1      US       A       100          5
2      AU       T       500        500
3      AU       A       500         20

For clarity, this is what is being passed to map:

>>> df[df['product'] == 'T'].groupby('country')['installs'].max()

country
AU    500
US    100
Name: installs, dtype: int64

So you can use it like a dict with the index (country) as a key and the installs as a value.

Changing value to be the maximum value per group

Answers (2)

Related Questions