MikeB2019x
MikeB2019x

Reputation: 1205

keep pandas merge_asof finding nearest value only within the same dat

I want to merge two timeseries dataframes. The values in the first df should merge in the values of the second df that have the closest (but not exactly matching) datetime stamp. BUT the datetime stamp of the second df should only be merged in IFF it is of the same date.

Pandas merge_asof provides the functionality desired except that it will merge values from any preceding date if that is the 'nearest'. The example below shows the current behaviour and the desired behaviour.

Any ways around the current behaviour?

import pandas as pd
from datetime import datetime

c = pd.DataFrame([["2017-01-25 00:30:17", "LQE7GNC8O"],
["2017-01-25 00:30:18", "IWK8UOOU1"],
["2017-01-25 00:30:46", "MULAPBFTI"],
["2017-01-25 01:00:08", "RO9N7X31Z"],
["2017-01-25 01:00:08", "SDSFKA3LK"],
["2017-01-25 01:00:42", "YRVASRCNT"],
["2017-01-25 01:00:43", "D5KVPH3H6"],
["2017-01-25 01:00:48", "QZ98EIW2O"],
["2017-01-25 01:01:49", "LBC4F46JG"],
["2017-01-25 02:01:49", "PKGUHA9SS"],
["2017-01-25 03:15:24", "8YD2DFLMZ"]], columns=["datetime", "key"])

d = pd.DataFrame([["2017-01-24 00:00:00", "111111111"],
["2017-01-24 23:00:22", "111111111"],
["2017-01-25 01:00:22", "J64SHBLXH"],
["2017-01-25 01:00:27", "XCKJCJWSR"],
["2017-01-25 01:15:42", "3XSN8RWEY"],
["2017-01-25 02:14:42", "UIQKQL9EH"]], columns=["datetime", "words"])

c["datetime"] = pd.to_datetime(c["datetime"])
d["datetime"] = pd.to_datetime(d["datetime"])

pd.merge_asof(c, d.sort_values("datetime"), on="datetime", allow_exact_matches=False)
YIELDS:
              datetime        key      words
0  2017-01-25 00:30:17  LQE7GNC8O  111111111
1  2017-01-25 00:30:18  IWK8UOOU1  111111111
2  2017-01-25 00:30:46  MULAPBFTI  111111111
3  2017-01-25 01:00:08  RO9N7X31Z  111111111
4  2017-01-25 01:00:08  SDSFKA3LK  111111111
5  2017-01-25 01:00:42  YRVASRCNT  XCKJCJWSR
6  2017-01-25 01:00:43  D5KVPH3H6  XCKJCJWSR
7  2017-01-25 01:00:48  QZ98EIW2O  XCKJCJWSR
8  2017-01-25 01:01:49  LBC4F46JG  XCKJCJWSR
9  2017-01-25 02:01:49  PKGUHA9SS  3XSN8RWEY
10 2017-01-25 03:15:24  8YD2DFLMZ  UIQKQL9EH

DESIRED:
              datetime        key      words
0  2017-01-25 00:30:17  LQE7GNC8O  NaN
1  2017-01-25 00:30:18  IWK8UOOU1  NaN
2  2017-01-25 00:30:46  MULAPBFTI  NaN
3  2017-01-25 01:00:08  RO9N7X31Z  NaN
4  2017-01-25 01:00:08  SDSFKA3LK  NaN
5  2017-01-25 01:00:42  YRVASRCNT  XCKJCJWSR
6  2017-01-25 01:00:43  D5KVPH3H6  XCKJCJWSR
7  2017-01-25 01:00:48  QZ98EIW2O  XCKJCJWSR
8  2017-01-25 01:01:49  LBC4F46JG  XCKJCJWSR
9  2017-01-25 02:01:49  PKGUHA9SS  3XSN8RWEY
10 2017-01-25 03:15:24  8YD2DFLMZ  UIQKQL9EH

Upvotes: 0

Views: 1085

Answers (1)

BENY
BENY

Reputation: 323366

Here you go pass the date to by

pd.merge_asof(c.assign(date=c.datetime.dt.date), d.sort_values("datetime").assign(date=d.datetime.dt.date), on="datetime", allow_exact_matches=False , by = 'date')
Out[215]: 
              datetime        key        date      words
0  2017-01-25 00:30:17  LQE7GNC8O  2017-01-25        NaN
1  2017-01-25 00:30:18  IWK8UOOU1  2017-01-25        NaN
2  2017-01-25 00:30:46  MULAPBFTI  2017-01-25        NaN
3  2017-01-25 01:00:08  RO9N7X31Z  2017-01-25        NaN
4  2017-01-25 01:00:08  SDSFKA3LK  2017-01-25        NaN
5  2017-01-25 01:00:42  YRVASRCNT  2017-01-25  XCKJCJWSR
6  2017-01-25 01:00:43  D5KVPH3H6  2017-01-25  XCKJCJWSR
7  2017-01-25 01:00:48  QZ98EIW2O  2017-01-25  XCKJCJWSR
8  2017-01-25 01:01:49  LBC4F46JG  2017-01-25  XCKJCJWSR
9  2017-01-25 02:01:49  PKGUHA9SS  2017-01-25  3XSN8RWEY
10 2017-01-25 03:15:24  8YD2DFLMZ  2017-01-25  UIQKQL9EH

Upvotes: 2

Related Questions