Reputation: 43
Getting error while trying to merge two dataframes on datetime column
Where df1 has:
TimeStamp Value1
01-01-2019 00:00:00 v1
And df2 will have:
TimeStamp Value2
01-01-2019 9:23:52 v5
01-01-2019 10:33:52 v6
01-01-2019 11:43:52 v7
.
.
What i need:
TimeStamp Value2 Value1
01-01-2019 9:23:52 v5 v1
01-01-2019 10:33:52 v6 v1
01-01-2019 11:43:52 v7 v1
.
.
but currently df2 has:Empty dataframe, Columns:[TimeStamp, Value2]
Then the expected output should be:
TimeStamp Value2 Value1
01-01-2019 00:00:00 None v1
What i tried:
full_merged=df1.merge(df2, left_on=pd.to_datetime(df2['TimeStamp'],format='%Y-%m-%d'), right_on=pd.to_datetime(df1['TimeStamp'],format='%Y-%m-%d'),how='outer')
Getting below error:
raise KeyError(key) KeyError: 0 2019-08-27 Name: TimeStamp, dtype: datetime64[ns]
Any help would be highly appreciated.
Upvotes: 2
Views: 1401
Reputation: 30971
First of all reverse the merge direction. A more natural approach is that the "base" DataFrame is df2.
The second point to correct is that the merge should be on TimeStamp column, converted to DateTime and in case of df2 - with the time part "cancelled" (I assume that in both source DataFrames TimeStamp column is of string type).
Third detail is that:
Outer merge mode would give weird result, e.g. if df1 contained rows with dates absent in df2.
In this case, the result would contain rows with NaN in both TimeStamp and Value2 and the Value from these rows from df1, without "source" dates.
So, taking into account above remarks, I propose the following solution:
Generate a Series to merge:
take only Value1 column:
val1 = df1.set_index(pd.to_datetime(df1.TimeStamp)).Value1
Then perform merge of:
and drop the current index (create the new from scratch):
df2.set_index(pd.to_datetime(df2.TimeStamp).dt.floor('d'))\
.merge(val1, left_index=True, right_index=True, how='left')\
.reset_index(drop=True)
The result is:
TimeStamp Value2 Value1
0 01-01-2019 09:23:52 v5 v1
1 01-01-2019 10:33:52 v6 v1
2 01-01-2019 11:43:52 v7 v1
So, although the merge was on date part only, the result still has original TimeStamp from df2.
Your code failed also because the actual format of both TimeStamp columns
was "day-month-year", whereas you specified %Y-%m-%d
(year-month-day).
My advice: If not needed, don't specify the date / time format.
pd.to_datetime is clever enough to recognize the format from source data.
Upvotes: 1