Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Changing value to be the maximum value per group

I have this kind of structure:

country     product      installs   purchases
US          T            100        100
US          A            5          5
AU          T            500        500   
AU          A            20         20   

I am trying to get:

country     product      installs   purchases
US          T            100        100
US          A            100        5
AU          T            500        500   
AU          A            500        20  

Each value in the installs columns needs to be the value of installs where product column's value is T.

I tried:

exp.groupby(['country','product'])['date_install_'] = max(exp.groupby(['country','product'])['date_install_'])

Which does not work and I am kind of lost. How can I achieve the result?

Upvotes: 1

Views: 34

Answers (2)

DanCor
DanCor

Reputation: 338

If the T values is always the max value, you can use an auxiliary df that holds the max value of installs per country and then merge that with the original df and replace the max value for the install value:

aux = df.groupby('country').installs.max().reset_index
df.drop('installs', axis=1).merge(aux, how='left', on='country')

You reset the index so that you can use country as a column in the first line. You drop installs before you merge because the aux df already has the value and name of the installs you want.

Upvotes: 0

Tom
Tom

Reputation: 8800

Find the rows where the product is T, groupby the country, and get the maxiumum of the installs. Use this as a map to replace the values in installs:

df['installs'] = df['country'].map(df[df['product'] == 'T'].groupby('country')['installs'].max())

Result:

  country product  installs  purchases
0      US       T       100        100
1      US       A       100          5
2      AU       T       500        500
3      AU       A       500         20

For clarity, this is what is being passed to map:

>>> df[df['product'] == 'T'].groupby('country')['installs'].max()

country
AU    500
US    100
Name: installs, dtype: int64

So you can use it like a dict with the index (country) as a key and the installs as a value.

Upvotes: 2

Related Questions