ssnake
ssnake

Reputation: 45

Setting values in grouped data frame in pandas

I have 2 data frames grouped by 4 separate keys. I would like to assign the mean of a column of one group, to all the row values in a column in another group. As I understand it, this is how it should be done:

g_test.get_group((1, 5, 13, 8)).monthly_sales = \
    g_train.get_group((1, 5, 13, 8)).monthly_sales.mean()

Except this does nothing. The values in monthly_sales of the group identified in g_test are unchanged. Can someone please explain what I am doing wrong and suggest alternatives?

These are the first few rows of g_train.get_group((1, 5, 13, 8))

    year month day store item units monthly_sales
    1      5    5   13     8    4   466
    1      5    6   13     8    12  475
    1      5    0   13     8    22  469
    1      5    5   13     8    26  469
    1      5    6   13     8    39  480

and these are the first few rows of g_test.get_group((1, 5, 13, 8))

    year month day store item monthly_sales
    1      5    1   13     8    0
    1      5    2   13     8    0
    1      5    3   13     8    0
    1      5    4   13     8    0
    1      5    5   13     8    0

Only the first few rows are shown, but the mean of g_train((1, 5, 13, 8)).monthly_sales is 450, which I want to be copied over to the monthly_sales column in g_test.

Edit: I now understand that, the code snippet below will work:

`df1.loc[(df1.year == 1) 
    & (df1.month == 5) 
    & (df1.store == 13) 
    & (df1.item == 8), 'monthly_sales'] = \
gb2.get_group((1, 5, 13, 8)).monthly_sales.mean()`

This operation is great for copying the mean once, however the whole reason I split the data frame into groups was to avoid these logic checks and do this multiple times for different store and item numbers. Is there something else I can do?

Upvotes: 1

Views: 173

Answers (2)

ssnake
ssnake

Reputation: 45

Actually I just discovered a better way. g_test is part of dataframe 'test', so when I tried the line below it worked perfectly

test.loc[g_test.get_group((1, 5, 13, 8)).index, 'monthly_sales'] = \
           g_train.get_group((1, 5, 13, 8)).monthly_sales.mean()

Upvotes: 0

Alexander
Alexander

Reputation: 109520

You need to assign the result back to the DataFrame, not the groupby object. This should work:

df1.loc[(df1.year == 1) 
        & (df1.month == 5) 
        & (df1.store == 13) 
        & (df1.item == 8), 'monthly_sales'] = \
    gb2.get_group((1, 5, 13, 8)).monthly_sales.mean()

>>> gb1.get_group((1, 5, 13, 8))
   year  month  day  store  item  units  monthly_sales
0     1      5    5     13     8      4          471.8
1     1      5    6     13     8     12          471.8
2     1      5    0     13     8     22          471.8
3     1      5    5     13     8     26          471.8
4     1      5    6     13     8     39          471.8

Upvotes: 1

Related Questions