Reputation: 1205
I want to merge two timeseries dataframes. The values in the first df should merge in the values of the second df that have the closest (but not exactly matching) datetime stamp. BUT the datetime stamp of the second df should only be merged in IFF it is of the same date.
Pandas merge_asof provides the functionality desired except that it will merge values from any preceding date if that is the 'nearest'. The example below shows the current behaviour and the desired behaviour.
Any ways around the current behaviour?
import pandas as pd
from datetime import datetime
c = pd.DataFrame([["2017-01-25 00:30:17", "LQE7GNC8O"],
["2017-01-25 00:30:18", "IWK8UOOU1"],
["2017-01-25 00:30:46", "MULAPBFTI"],
["2017-01-25 01:00:08", "RO9N7X31Z"],
["2017-01-25 01:00:08", "SDSFKA3LK"],
["2017-01-25 01:00:42", "YRVASRCNT"],
["2017-01-25 01:00:43", "D5KVPH3H6"],
["2017-01-25 01:00:48", "QZ98EIW2O"],
["2017-01-25 01:01:49", "LBC4F46JG"],
["2017-01-25 02:01:49", "PKGUHA9SS"],
["2017-01-25 03:15:24", "8YD2DFLMZ"]], columns=["datetime", "key"])
d = pd.DataFrame([["2017-01-24 00:00:00", "111111111"],
["2017-01-24 23:00:22", "111111111"],
["2017-01-25 01:00:22", "J64SHBLXH"],
["2017-01-25 01:00:27", "XCKJCJWSR"],
["2017-01-25 01:15:42", "3XSN8RWEY"],
["2017-01-25 02:14:42", "UIQKQL9EH"]], columns=["datetime", "words"])
c["datetime"] = pd.to_datetime(c["datetime"])
d["datetime"] = pd.to_datetime(d["datetime"])
pd.merge_asof(c, d.sort_values("datetime"), on="datetime", allow_exact_matches=False)
YIELDS:
datetime key words
0 2017-01-25 00:30:17 LQE7GNC8O 111111111
1 2017-01-25 00:30:18 IWK8UOOU1 111111111
2 2017-01-25 00:30:46 MULAPBFTI 111111111
3 2017-01-25 01:00:08 RO9N7X31Z 111111111
4 2017-01-25 01:00:08 SDSFKA3LK 111111111
5 2017-01-25 01:00:42 YRVASRCNT XCKJCJWSR
6 2017-01-25 01:00:43 D5KVPH3H6 XCKJCJWSR
7 2017-01-25 01:00:48 QZ98EIW2O XCKJCJWSR
8 2017-01-25 01:01:49 LBC4F46JG XCKJCJWSR
9 2017-01-25 02:01:49 PKGUHA9SS 3XSN8RWEY
10 2017-01-25 03:15:24 8YD2DFLMZ UIQKQL9EH
DESIRED:
datetime key words
0 2017-01-25 00:30:17 LQE7GNC8O NaN
1 2017-01-25 00:30:18 IWK8UOOU1 NaN
2 2017-01-25 00:30:46 MULAPBFTI NaN
3 2017-01-25 01:00:08 RO9N7X31Z NaN
4 2017-01-25 01:00:08 SDSFKA3LK NaN
5 2017-01-25 01:00:42 YRVASRCNT XCKJCJWSR
6 2017-01-25 01:00:43 D5KVPH3H6 XCKJCJWSR
7 2017-01-25 01:00:48 QZ98EIW2O XCKJCJWSR
8 2017-01-25 01:01:49 LBC4F46JG XCKJCJWSR
9 2017-01-25 02:01:49 PKGUHA9SS 3XSN8RWEY
10 2017-01-25 03:15:24 8YD2DFLMZ UIQKQL9EH
Upvotes: 0
Views: 1085
Reputation: 323366
Here you go pass the date
to by
pd.merge_asof(c.assign(date=c.datetime.dt.date), d.sort_values("datetime").assign(date=d.datetime.dt.date), on="datetime", allow_exact_matches=False , by = 'date')
Out[215]:
datetime key date words
0 2017-01-25 00:30:17 LQE7GNC8O 2017-01-25 NaN
1 2017-01-25 00:30:18 IWK8UOOU1 2017-01-25 NaN
2 2017-01-25 00:30:46 MULAPBFTI 2017-01-25 NaN
3 2017-01-25 01:00:08 RO9N7X31Z 2017-01-25 NaN
4 2017-01-25 01:00:08 SDSFKA3LK 2017-01-25 NaN
5 2017-01-25 01:00:42 YRVASRCNT 2017-01-25 XCKJCJWSR
6 2017-01-25 01:00:43 D5KVPH3H6 2017-01-25 XCKJCJWSR
7 2017-01-25 01:00:48 QZ98EIW2O 2017-01-25 XCKJCJWSR
8 2017-01-25 01:01:49 LBC4F46JG 2017-01-25 XCKJCJWSR
9 2017-01-25 02:01:49 PKGUHA9SS 2017-01-25 3XSN8RWEY
10 2017-01-25 03:15:24 8YD2DFLMZ 2017-01-25 UIQKQL9EH
Upvotes: 2