Ayesha Khan
Ayesha Khan

Reputation: 43

Getting KeyError while merging pandas dataframe: Name: TimeStamp, dtype: datetime64[ns]

Getting error while trying to merge two dataframes on datetime column

Where df1 has:

TimeStamp                Value1

01-01-2019 00:00:00       v1

And df2 will have:

TimeStamp                Value2

01-01-2019 9:23:52        v5
01-01-2019 10:33:52       v6
01-01-2019 11:43:52       v7
.
.

What i need:

TimeStamp                Value2    Value1

01-01-2019 9:23:52        v5        v1
01-01-2019 10:33:52       v6        v1
01-01-2019 11:43:52       v7        v1
.
.

but currently df2 has:Empty dataframe, Columns:[TimeStamp, Value2]

Then the expected output should be:

TimeStamp                        Value2    Value1

01-01-2019  00:00:00              None        v1

What i tried:

full_merged=df1.merge(df2, left_on=pd.to_datetime(df2['TimeStamp'],format='%Y-%m-%d'), right_on=pd.to_datetime(df1['TimeStamp'],format='%Y-%m-%d'),how='outer')

Getting below error:

raise KeyError(key) KeyError: 0 2019-08-27 Name: TimeStamp, dtype: datetime64[ns]

Any help would be highly appreciated.

Upvotes: 2

Views: 1401

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

First of all reverse the merge direction. A more natural approach is that the "base" DataFrame is df2.

The second point to correct is that the merge should be on TimeStamp column, converted to DateTime and in case of df2 - with the time part "cancelled" (I assume that in both source DataFrames TimeStamp column is of string type).

Third detail is that:

  • the object to merge to df2 should be only Value1 column from df1.
  • the merge mode (how) should be left.

Outer merge mode would give weird result, e.g. if df1 contained rows with dates absent in df2.

In this case, the result would contain rows with NaN in both TimeStamp and Value2 and the Value from these rows from df1, without "source" dates.

So, taking into account above remarks, I propose the following solution:

Generate a Series to merge:

  • from df1 with index set to TimeStamp column, coverted to DateTime,
  • take only Value1 column:

    val1 = df1.set_index(pd.to_datetime(df1.TimeStamp)).Value1
    

Then perform merge of:

  • df2 with index set to TimeStamp column, coverted to DateTime, and cancelled time part,
  • with val1,
  • on indices in both sources,
  • in left mode,
  • and drop the current index (create the new from scratch):

    df2.set_index(pd.to_datetime(df2.TimeStamp).dt.floor('d'))\
        .merge(val1, left_index=True, right_index=True, how='left')\
        .reset_index(drop=True)
    

The result is:

             TimeStamp Value2 Value1
0  01-01-2019 09:23:52     v5     v1
1  01-01-2019 10:33:52     v6     v1
2  01-01-2019 11:43:52     v7     v1

So, although the merge was on date part only, the result still has original TimeStamp from df2.

Edit

Your code failed also because the actual format of both TimeStamp columns was "day-month-year", whereas you specified %Y-%m-%d (year-month-day). My advice: If not needed, don't specify the date / time format. pd.to_datetime is clever enough to recognize the format from source data.

Upvotes: 1

Related Questions