Reputation: 133
I am trying to merge the 2 DataFrames below to get an output where each code is listed on each date and the quantity is filled as 0 if the code was not in the original dataframe on that date. I have put an example of my input and desired output below but my live data will have over a years worth of dates and over 20,000 codes.
Input data:
df1
date
0 2021-05-03
1 2021-05-04
2 2021-05-05
3 2021-05-06
4 2021-05-07
5 2021-05-08
6 2021-05-09
7 2021-05-10
df2
date code qty
0 2021-05-03 A 2
1 2021-05-06 A 5
2 2021-05-07 A 4
3 2021-05-08 A 5
4 2021-05-10 A 6
5 2021-05-04 B 1
6 2021-05-08 B 4
Desired Output:
date code qty
03/05/2021 A 2
03/05/2021 B 0
04/05/2021 A 0
04/05/2021 B 1
05/05/2021 A 0
05/05/2021 B 0
06/05/2021 A 5
06/05/2021 B 0
07/05/2021 A 4
07/05/2021 B 0
08/05/2021 A 5
08/05/2021 B 4
09/05/2021 A 0
09/05/2021 B 0
10/05/2021 A 6
10/05/2021 B 0
I have tried the below merge but the output I get does not seem to be as desired:
df_new = df1.merge(df2, how='left', on='date')
date code qty
0 2021-05-03 A 2.0
1 2021-05-04 B 1.0
2 2021-05-05 NaN NaN
3 2021-05-06 A 5.0
4 2021-05-07 A 4.0
5 2021-05-08 A 5.0
6 2021-05-08 B 4.0
7 2021-05-09 NaN NaN
8 2021-05-10 A 6.0
Upvotes: 3
Views: 513
Reputation: 59529
This is better suited for a reindex
. You create all combinations, set the index, reindex to all of those combinations, fillna and then reset the index.
import pandas as pd
idx = pd.MultiIndex.from_product([df1.date, df2['code'].unique()],
names=['date', 'code'])
df2 = (df2.set_index(['date', 'code'])
.reindex(idx)
.fillna(0, downcast='infer')
.reset_index())
date code qty
0 2021-05-03 A 2
1 2021-05-03 B 0
2 2021-05-04 A 0
3 2021-05-04 B 1
4 2021-05-05 A 0
5 2021-05-05 B 0
6 2021-05-06 A 5
7 2021-05-06 B 0
8 2021-05-07 A 4
9 2021-05-07 B 0
10 2021-05-08 A 5
11 2021-05-08 B 4
12 2021-05-09 A 0
13 2021-05-09 B 0
14 2021-05-10 A 6
15 2021-05-10 B 0
Upvotes: 3
Reputation: 150735
One option with pivot
and stack
:
(df2.pivot_table(index='date', columns='code', fill_value=0)
.reindex(df1.date, fill_value=0)
.stack('code')
.reset_index()
)
Output:
date code qty
0 2021-05-03 A 2
1 2021-05-03 B 0
2 2021-05-04 A 0
3 2021-05-04 B 1
4 2021-05-05 A 0
5 2021-05-05 B 0
6 2021-05-06 A 5
7 2021-05-06 B 0
8 2021-05-07 A 4
9 2021-05-07 B 0
10 2021-05-08 A 5
11 2021-05-08 B 4
12 2021-05-09 A 0
13 2021-05-09 B 0
14 2021-05-10 A 6
15 2021-05-10 B 0
Upvotes: 3
Reputation: 34046
Do a cross-join
between df1
and unique vals
of code
. Then use df.fillna()
:
In [480]: x = pd.DataFrame(df2.code.unique())
In [483]: y = df1.assign(key=1).merge(x.assign(key=1), on='key').drop('key', 1).rename(columns={0: 'code'})
In [486]: res = y.merge(df2, how='left').fillna(0)
In [487]: res
Out[487]:
date code qty
0 2021-05-03 A 2.0
1 2021-05-03 B 0.0
2 2021-05-04 A 0.0
3 2021-05-04 B 1.0
4 2021-05-05 A 0.0
5 2021-05-05 B 0.0
6 2021-05-06 A 5.0
7 2021-05-06 B 0.0
8 2021-05-07 A 4.0
9 2021-05-07 B 0.0
10 2021-05-08 A 5.0
11 2021-05-08 B 4.0
12 2021-05-09 A 0.0
13 2021-05-09 B 0.0
14 2021-05-10 A 6.0
15 2021-05-10 B 0.0
Upvotes: 2