Reputation: 173
what I have so far is a normal transactional dataframe with the following columns:
store | item | year | month | day | sales
'year' can be 2015, 2016, 2017.
With that I created a summary dataframe:
store_item_years = df.groupby(
['store','item','year'])['sales'].agg(
[np.sum, np.mean, np.std, np.median, np.min, np.max]).unstack(
fill_value=0)
The last one results in a Multi-Index with 2 levels, like this:
sum mean
year | 2015 | 2016 | 2017 | 2015 | 2016 | 2017 | ...
store | item sum1 ... ... mean1 mean2 ... | ...
Now I'd like to merge the summary table back onto the transactional one:
store | item | year | month | day | sales | + | sum+'by'+year | mean+'by'+year
2015 sum1 mean1
2016 sum2 mean2
2017 ... ...
I am trying to merge with the following:
df = pd.merge(df, store_item_years,
left_on=['store', 'item', 'year'],
right_on=['store', 'item', 'year'],
how='left')
which results in the following error:
KeyError: 'year'
Any ideas? I am just getting my head around groupby. I haven't looked into PivotTable yet.
Please keep in mind that the problem is simplified. The number of store_item combinations is 200+K and other groupbys having 300+ columns. But always the same principle.
Thanks a lot in advance.
Upvotes: 2
Views: 1452
Reputation: 173
found the culprit. Removed .unstack().
store_item_years = df.groupby(
['store','item','year'])['sales'].agg(
[np.sum, np.mean, np.std, np.median, np.min, np.max])
The following to keep the context:
store_item_years.columns = store_item_years.columns+'_by_year'
And merge like this:
pd.merge(df, store_item_years.reset_index(),
left_on=['store', 'item', 'year'],
right_on=['store', 'item', 'year'],
how='left')
Upvotes: 0