Reputation: 950
I have 2 separate pandas series having different lengths.
The first and shorter one has a set of elements (float numbers). For each element, I wish to find the closest match (least absolute difference) with respect to the elements in the second and larger series.
I also wish to know the indices of the closest match elements in the second series.
I tried using the reindex method, but it throws up an error 'ValueError: cannot reindex a non-unique index with a method or limit' since the second series has non-unique values which are set as indices.
This was the code that I used to try to find closest match of elements in series B with respect to the elements in series A.
A = pd.Series([1.0, 4.0, 10.0, 4.0, 5.0, 19.0, 20.0])
B = pd.Series([0.8, 5.1, 10.1, 0.3, 5.5])
pd.Series(A.values, A.values).reindex(B.values, method='nearest')
ValueError: cannot reindex a non-unique index with a method or limit
At the end, I wish to have a dataframe like the following.
B Closest_match_in_Series_A Index_of_closest_match_in Series_A
0.8 1.0 0
5.1 5.0 4
10.1 10.0 2
0.3 1.0 0
5.5 5.0 4
Upvotes: 1
Views: 221
Reputation: 323286
So here is one way using numpy
broadcast
A.iloc[np.abs(B.values-A.values[:,None]).argmin(axis=0)]
0 1.0
4 5.0
2 10.0
0 1.0
4 5.0
dtype: float64
And here is the fix adding drop_duplicates
pd.Series(A.values, A.values).sort_index().drop_duplicates().reindex(B.values, method='nearest')
0.8 1.0
5.1 5.0
10.1 10.0
0.3 1.0
5.5 5.0
dtype: float64
Upvotes: 3