zesla
zesla

Reputation: 11793

How to join a table with each group of a dataframe in pandas

I have a dataframe like below. Each date is Monday of each week.

df = pd.DataFrame({'date' :['2020-04-20', '2020-05-11','2020-05-18',
                                 '2020-04-20', '2020-04-27','2020-05-04','2020-05-18'],
                         'name': ['A', 'A', 'A', 'B', 'B', 'B', 'B'], 
                          'count': [23, 44, 125, 6, 9, 10, 122]})

    date      name  count
0   2020-04-20  A   23
1   2020-05-11  A   44
2   2020-05-18  A   125
3   2020-04-20  B   6
4   2020-04-27  B   9
5   2020-05-04  B   10
6   2020-05-18  B   122

Neither 'A' and 'B' covers the whole date range. Both of them have some missing dates, which means the counts on that week is 0. Below is all the dates:

df_dates = pd.DataFrame({ 'date':['2020-04-20', '2020-04-27','2020-05-04','2020-05-11','2020-05-18'] }) 

So what I need is like the dataframe below:

    date      name  count
0   2020-04-20  A   23
1   2020-04-27  A   0
2   2020-05-04  A   0
3   2020-05-11  A   44
4   2020-05-18  A   125
5   2020-04-20  B   6
6   2020-04-27  B   9
7   2020-05-04  B   10
8   2020-05-11  B   0
9   2020-05-18  B   122

It seems like I need to join (merge) df_dates with df for each name group ( A and B) and then fill the data with missing name and missing count value with 0's. Does anyone know achieve that? how I can join with another table with a grouped table?

I tried and no luck...

pd.merge(df_dates, df.groupby('name'), how='left', on='date')

Upvotes: 1

Views: 216

Answers (2)

BENY
BENY

Reputation: 323276

We can do reindex with multiple index creation

idx=pd.MultiIndex.from_product([df_dates.date,df.name.unique()],names=['date','name'])  
s=df.set_index(['date','name']).reindex(idx,fill_value=0).reset_index().sort_values('name')
Out[136]: 
         date name  count
0  2020-04-20    A     23
2  2020-04-27    A      0
4  2020-05-04    A      0
6  2020-05-11    A     44
8  2020-05-18    A    125
1  2020-04-20    B      6
3  2020-04-27    B      9
5  2020-05-04    B     10
7  2020-05-11    B      0
9  2020-05-18    B    122

Or

s=df.pivot(*df.columns).reindex(df_dates.date).fillna(0).reset_index().melt('date')
Out[145]: 
         date name  value
0  2020-04-20    A   23.0
1  2020-04-27    A    0.0
2  2020-05-04    A    0.0
3  2020-05-11    A   44.0
4  2020-05-18    A  125.0
5  2020-04-20    B    6.0
6  2020-04-27    B    9.0
7  2020-05-04    B   10.0
8  2020-05-11    B    0.0
9  2020-05-18    B  122.0

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150745

If you are looking for just fill in the union of dates in df, you can do:

(df.set_index(['date','name'])
   .unstack('date',fill_value=0)
   .stack().reset_index()
)

Output:

  name        date  count
0    A  2020-04-20     23
1    A  2020-04-27      0
2    A  2020-05-04      0
3    A  2020-05-11     44
4    A  2020-05-18    125
5    B  2020-04-20      6
6    B  2020-04-27      9
7    B  2020-05-04     10
8    B  2020-05-11      0
9    B  2020-05-18    122

Upvotes: 1

Related Questions