Aquiles Páez
Aquiles Páez

Reputation: 573

problem with using merge_asof from Pandas

I have these two dataframes:

DF1=

 Inflow
0  9810998109
1  5591255912
2  7394273942
3  7866678666
4  1820118202
5  9812198109
6  9810998101
7  4304043040
8  9810998121

DF2=

       Inflow  mi_to_zcta5
0  3371433756    11.469054
1  1790118201    24.882142

I'd like to do an operation where I can merge these two dataframes based in the 'Inflow' column. Sort of like trying to recreate a VLookUp Excel function with approximate matching (like its shown in this question). But I get a failure each time. The line I've been trying to use for this purpose is this one:

test = pd.merge_asof(DF1, DF2, on = 'mi_to_zcta5')

I've tried using the additional settings like 'allow_exact_matches' set to 'False' but I've not succeeded.

This is the error I get:

 return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'mi_to_zcta5'

I'd like to get a dataframe with 10 rows, with the "Inflow" column and an additional column of 'mi_to_zcta5' with the correspondant nearest value (if that's possible). Just like one would do in VLookUp in excel using approximate match.

Thanks in advance!

Upvotes: 0

Views: 3923

Answers (1)

Karn Kumar
Karn Kumar

Reputation: 8816

This is your solution:

In your first dataFrame(df1) you have only column while Second(df2) has two, while doing pd.merge you have to choose outer, which is a union of the keys. This means all of the indexes are shown and where it has missing cols it keeps them as NaN.

>>> df1
       Inflow
0  9810998109
1  5591255912
2  7394273942
3  7866678666
4  1820118202
5  9812198109
6  9810998101
7  4304043040
8  9810998121
>>> df2
       Inflow  mi_to_zcta5
0  3371433756    11.469054
1  1790118201    24.882142
>>>
>>>
>>>
>>> pd.merge( df1, df2, on=['Inflow'], how='outer')
        Inflow  mi_to_zcta5
0   9810998109          NaN
1   5591255912          NaN
2   7394273942          NaN
3   7866678666          NaN
4   1820118202          NaN
5   9812198109          NaN
6   9810998101          NaN
7   4304043040          NaN
8   9810998121          NaN
9   3371433756    11.469054
10  1790118201    24.882142

Note: you can not merge on the Key 'mi_to_zcta5 as this is not present on df

Upvotes: 2

Related Questions