Reputation: 1613
I have two dataframes. The first one is just a column of daily datetime, whereas the second one has both dates and data. This is an example:
df1 = pd.DataFrame({'Date': pd.date_range(start="2002-01-01",end="2021-04-14")})
df2 = pd.DataFrame.from_dict({'Date1': {0: '2002-01-01 00:00:00',
1: '2002-01-02 00:00:00',
2: '2002-01-03 00:00:00',
3: '2002-01-04 00:00:00',
4: '2002-01-07 00:00:00',
5: '2002-01-08 00:00:00',
6: '2002-01-09 00:00:00',
7: '2002-01-10 00:00:00',
8: '2002-01-11 00:00:00',
9: '2002-01-14 00:00:00',
10: '2002-01-15 00:00:00',
11: '2002-01-16 00:00:00',
12: '2002-01-17 00:00:00',
13: '2002-01-18 00:00:00',
14: '2002-01-21 00:00:00'},
'GSEAFCI': {0: 98.85,
1: 98.96,
2: 98.88,
3: 98.85,
4: 98.83,
5: 98.86,
6: 98.87,
7: 98.9,
8: 98.87,
9: 98.91,
10: 98.84,
11: 98.87,
12: 98.84,
13: 98.84,
14: 98.87},
'Date2': {0: '2004-03-03 00:00:00',
1: '2004-03-04 00:00:00',
2: '2004-03-05 00:00:00',
3: '2004-03-08 00:00:00',
4: '2004-03-09 00:00:00',
5: '2004-03-10 00:00:00',
6: '2004-03-11 00:00:00',
7: '2004-03-12 00:00:00',
8: '2004-03-15 00:00:00',
9: '2004-03-16 00:00:00',
10: '2004-03-17 00:00:00',
11: '2004-03-18 00:00:00',
12: '2004-03-19 00:00:00',
13: '2004-03-22 00:00:00',
14: '2004-03-23 00:00:00'},
'2Y2YINF': {0: 3.9,
1: 3.832,
2: 3.668,
3: 3.63,
4: 3.649,
5: 3.636,
6: 3.63,
7: 3.633,
8: 3.628,
9: 3.665,
10: 3.621,
11: 3.62,
12: 3.616,
13: 3.588,
14: 3.5949999999999998},
'Date3': {0: '2002-01-02 00:00:00',
1: '2002-01-03 00:00:00',
2: '2002-01-04 00:00:00',
3: '2002-01-07 00:00:00',
4: '2002-01-08 00:00:00',
5: '2002-01-09 00:00:00',
6: '2002-01-10 00:00:00',
7: '2002-01-11 00:00:00',
8: '2002-01-14 00:00:00',
9: '2002-01-15 00:00:00',
10: '2002-01-16 00:00:00',
11: '2002-01-17 00:00:00',
12: '2002-01-18 00:00:00',
13: '2002-01-21 00:00:00',
14: '2002-01-22 00:00:00'},
'5Y5YINF': {0: 5.9274000000000004,
1: 5.8767,
2: 5.8977,
3: 5.763,
4: 5.8056,
5: 5.825,
6: 5.7287,
7: 5.6922,
8: 5.6222,
9: 5.6145,
10: 5.6655999999999995,
11: 5.76,
12: 5.7455,
13: 5.718,
14: 5.7589}})
What I want to do is to merge df1
and df2
to get a dataframe (dataset
) where: when the data exist it takes the date position; when it doesn't exist, it just gets NaN. This is what I did:
dataset = df1.merge(df2, how = 'left', left_on= ['Date', 'Date','Date'],
right_on=['Date1','Date2','Date3'])
Unfortunately, I get only NaNs or NaTs everywhere.
Instead, I would like to get:
Date GSEAFCI 2Y2YINF 5Y5YINF
0 2002-01-01 00:00:00 98.85 NaN 5.9274
1 2002-01-02 00:00:00 98.96 NaN 5.8767
2 2002-01-03 00:00:00 98.88 NaN 5.8977
3 2002-01-04 00:00:00 98.85 NaN 5.7630
4 2002-01-05 00:00:00 NaN NaN NaN
5 2002-01-06 00:00:00 NaN NaN NaN
6 2002-01-07 00:00:00 98.83 NaN 5.8056
7 2002-01-08 00:00:00 98.86 NaN 5.8250
8 2002-01-09 00:00:00 98.87 NaN 5.7287
...
...
Can anyone help me with this?
Thanks!
Upvotes: 0
Views: 72
Reputation: 862511
If there si same pattern - datetimes with data columns is possible use list comprehension with concat
, then convert values to datetimes in index and DataFrame.merge
by first DataFrame (by index name Date
and column name Date
):
a = df2.columns[::2]
b = df2.columns[1::2]
dfs = [df2.loc[:, x].set_index(x[0], drop=False)[x[1]] for x in zip(a, b)]
df = pd.concat(dfs, axis=1, sort=False).rename_axis('Date')
df.index = pd.to_datetime(df.index)
df = df1.merge(df, on='Date', how='left')
print (df.head(10))
Date GSEAFCI 2Y2YINF 5Y5YINF
0 2002-01-01 98.85 NaN NaN
1 2002-01-02 98.96 NaN 5.9274
2 2002-01-03 98.88 NaN 5.8767
3 2002-01-04 98.85 NaN 5.8977
4 2002-01-05 NaN NaN NaN
5 2002-01-06 NaN NaN NaN
6 2002-01-07 98.83 NaN 5.7630
7 2002-01-08 98.86 NaN 5.8056
8 2002-01-09 98.87 NaN 5.8250
9 2002-01-10 98.90 NaN 5.7287
Upvotes: 1