David
David

Reputation: 891

change values in groupby object with multi index

I have the following df that I group and then try to change values of US with the overall per type sum, but I get nan:

dff=pd.DataFrame({'country':['US','US','UK','UK','FR','FR','ES','ES'],'type':['A','B','C','B','A','C','A','B'],'sales':[100,200,100,100,50,10,20,40]})

grouped=dff.groupby(['country','type'])['sales'].sum()

grouped['US']=dff.loc[dff.country!='US'].groupby('type')['sales'].sum()

print(grouped)

enter image description here

And I want to get:

US   A   70

US   B   140

Upvotes: 0

Views: 193

Answers (3)

sammywemmy
sammywemmy

Reputation: 28699

We can use pivot to assign the values to US :

dff=pd.DataFrame({'country':['US','US','UK','UK','FR','FR','ES','ES'],'type':['A','B','C','B','A','C','A','B'],'sales':[100,200,100,100,50,10,20,40]})

#use as_index=False to keep group keys within the dataframe
grouped=dff.groupby(['country','type'],as_index=False)['sales'].sum()

grouped.loc[lambda x: x.country.eq("US"), "sales"] = (grouped
                                                       #filter out US
                                                       #and search for A and B in the type column
                                                      .query("country != 'US' and type in ['A','B']")
                                                      .pivot(columns='type',values='sales')
                                                      .sum()
                                                      .array)

grouped

    country type    sales
0     ES    A        20
1     ES    B        40
2     FR    A        50
3     FR    C        10
4     UK    B        100
5     UK    C        100
6     US    A        70
7     US    B        140

Upvotes: 1

dimay
dimay

Reputation: 2804

When you call

dff.loc[dff.country != 'US'].groupby('type')['sales'].sum()

you get 3 values

type
A     70
B    140
C    110
Name: sales, dtype: int64

But

grouped["US"]

has only 2 values

type
A     70
B    140
Name: sales, dtype: int64

You can try it:

dff.loc[dff.country != 'US'].groupby('type')['sales'].sum()[["A", "B"]]
type
A     70
B    140
Name: sales, dtype: int64

Upvotes: 1

jezrael
jezrael

Reputation: 862801

Create MultiIndex by MultiIndex.from_product for assign Series to US level:

s = dff.loc[dff.country!='US'].groupby('type')['sales'].sum()
s.index = pd.MultiIndex.from_product([['US'], s.index])
grouped['US'] = s

print (grouped)
country  type
ES       A        20
         B        40
FR       A        50
         C        10
UK       B       100
         C       100
US       A        70
         B       140
Name: sales, dtype: int64

Upvotes: 1

Related Questions