Reputation: 4842
I have this kind of structure:
country product installs purchases
US T 100 100
US A 5 5
AU T 500 500
AU A 20 20
I am trying to get:
country product installs purchases
US T 100 100
US A 100 5
AU T 500 500
AU A 500 20
Each value in the installs
columns needs to be the value of installs
where product
column's value is T
.
I tried:
exp.groupby(['country','product'])['date_install_'] = max(exp.groupby(['country','product'])['date_install_'])
Which does not work and I am kind of lost. How can I achieve the result?
Upvotes: 1
Views: 34
Reputation: 338
If the T values is always the max value, you can use an auxiliary df that holds the max value of installs per country and then merge that with the original df and replace the max value for the install value:
aux = df.groupby('country').installs.max().reset_index
df.drop('installs', axis=1).merge(aux, how='left', on='country')
You reset the index so that you can use country as a column in the first line. You drop installs before you merge because the aux df already has the value and name of the installs you want.
Upvotes: 0
Reputation: 8800
Find the rows where the product is T
, groupby the country, and get the maxiumum of the installs
. Use this as a map
to replace the values in installs
:
df['installs'] = df['country'].map(df[df['product'] == 'T'].groupby('country')['installs'].max())
Result:
country product installs purchases
0 US T 100 100
1 US A 100 5
2 AU T 500 500
3 AU A 500 20
For clarity, this is what is being passed to map
:
>>> df[df['product'] == 'T'].groupby('country')['installs'].max()
country
AU 500
US 100
Name: installs, dtype: int64
So you can use it like a dict
with the index (country) as a key and the installs as a value.
Upvotes: 2