ajrlewis
ajrlewis

Reputation: 3058

Merging a groupby object with another data frame

I have a data frame with repesenting the sales of an item:

import pandas as pd

data = {'id': [1,1,1,1,2,2], 'week': [1,2,2,3,1,3], 'quantity': [1,2,4,3,2,2]}
df_sales = pd.DataFrame(data)
🐍 >>> df_sales
   id  week  quantity
0   1     1         1
1   1     2         2
2   1     3         3
3   2     1         2
4   2     3         2

I have another data frame that represents the available weeks:

data = {'week': [1,2,3]}
df_week = pd.DataFrame(data)
🐍 >>> df_week
   week
0     1
1     2
2     3

I want to groupby the id and the week and compute the mean, which I do as follows:

df = df_sales.groupby(by=['id', 'week'], as_index=False).mean()
🐍 >>> df
   id  week  quantity
0   1     1         1
1   1     2         3
2   1     3         3
3   2     1         2
4   2     3         2

However, I want to fill the missing week values (present in df_week) with 0, such that the output is:

🐍 >>> df
   id  week  quantity
0   1     1         1
1   1     2         3
2   1     3         3
3   2     1         2
4   2     2         0
4   2     3         2

Is it possible to merge the groupby with the df_week data frame?

Upvotes: 2

Views: 225

Answers (2)

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

Since all unique id and week combinations are needed in the result, one way is to first prepare a combinations frame with pd.merge passed how="cross":

combs = pd.merge(df_sales.id.drop_duplicates(), df_week.week, how="cross")

or for versions below 1.2

combs = pd.merge(df_sales.id.drop_duplicates().to_frame().assign(key=1),
                 df_week.week.to_frame().assign(key=1), on="key").drop(columns="key")

which gives

>>> combs

   id  week
0   1     1
1   1     2
2   1     3
3   2     1
4   2     2
5   2     3

Now we can left merge this with df that has the means filling NaNs with 0:

result = combs.merge(df, how="left", on=["id", "week"]).fillna(0, downcast="infer")

where downcast is to go back to integers from float type because of NaN(s) that appeared in the intermediate step,

to get

>>> result

   id  week  quantity
0   1     1         1
1   1     2         3
2   1     3         3
3   2     1         2
4   2     2         0
5   2     3         2

Upvotes: 1

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can reindex after groupby

# group and aggregate
df  = df_sales.groupby(['id', 'week']).mean()

# define new MultiIndex
idx = pd.MultiIndex.from_product([df.index.levels[0], df_week['week']])

# reindex with fill_value=0
df  = df.reindex(idx, fill_value=0).reset_index()

print(df)

   id  week  quantity
0   1     1         1
1   1     2         3
2   1     3         3
3   2     1         2
4   2     2         0
5   2     3         2

Upvotes: 2

Related Questions