user1403546
user1403546

Reputation: 1749

pandas merge_asof error when using float as key

When using pandas merge_asof as in the following example

import pandas as pd

left = pd.DataFrame({'a': [1.1, 5.5, 10.9], 'left_val': ['a', 'b', 'c']})

right = pd.DataFrame({'a': [1.0, 2.8, 5.4, 5.55, 7.4], 'right_val': [1, 2, 3, 6, 7]})

pd.merge_asof(left, right, on='a', direction='nearest', tolerance=5)

I get the error

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
   1363 
   1364             else:
-> 1365                 raise MergeError("key must be integer or timestamp")
   1366 
   1367         # validate allow_exact_matches

MergeError: key must be integer or timestamp

It looks weird since in the documentation I read

on : label

Field name to join on. Must be found in both DataFrames. The data MUST be ordered. Furthermore this must be a numeric column, such as datetimelike, integer, or float. On or left_on/right_on must be given

so using a float as key should be fine...

I'm using pandas 0.23.0

Upvotes: 2

Views: 2193

Answers (1)

Sven Harris
Sven Harris

Reputation: 2939

Looks like the tolerance parameter is only allowed for integer and timedelta values hence the error, it runs fine without.

Maybe you can fix it with a post-processing step to say:

right["b"] = right["a"]
df_result = pd.merge_asof(left, right, on='a', direction='nearest')
df_result.loc[abs(df_result["b"]-df_result["a"]) > 5, :] = np.nan
df_result.dropna()

This will merge them by nearest then null any rows where the join was above your tolerance (5 in this case) and then you can drop the null rows like the never existed at all...

Upvotes: 2

Related Questions