Reputation: 1749
When using pandas merge_asof as in the following example
import pandas as pd
left = pd.DataFrame({'a': [1.1, 5.5, 10.9], 'left_val': ['a', 'b', 'c']})
right = pd.DataFrame({'a': [1.0, 2.8, 5.4, 5.55, 7.4], 'right_val': [1, 2, 3, 6, 7]})
pd.merge_asof(left, right, on='a', direction='nearest', tolerance=5)
I get the error
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
1363
1364 else:
-> 1365 raise MergeError("key must be integer or timestamp")
1366
1367 # validate allow_exact_matches
MergeError: key must be integer or timestamp
It looks weird since in the documentation I read
on : label
Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. On or left_on/right_on must be given
so using a float as key should be fine...
I'm using pandas 0.23.0
Upvotes: 2
Views: 2193
Reputation: 2939
Looks like the tolerance parameter is only allowed for integer and timedelta values hence the error, it runs fine without.
Maybe you can fix it with a post-processing step to say:
right["b"] = right["a"]
df_result = pd.merge_asof(left, right, on='a', direction='nearest')
df_result.loc[abs(df_result["b"]-df_result["a"]) > 5, :] = np.nan
df_result.dropna()
This will merge them by nearest then null any rows where the join was above your tolerance (5 in this case) and then you can drop the null rows like the never existed at all...
Upvotes: 2