stackoverlord
stackoverlord

Reputation: 25

Pandas find the index of the closest value in a data frame column given a list of values

So I am trying to find a vectorized way to do this, say I have a data frame DF1 that contains a timestamp column with a lot of timestamps in increasing order. I also have a list with a target time and I want to use this to find the row or index of that row in DF1 that contains the closest timestamp to that target time. I have a solution that uses iterrows but I was wondering if there is a way to do it without iterrows as this might take too long.

The number of rows in the data frame are always bigger than the list of target time values. If the target time is in between two time stamps then always pick the smallest time stamp.

#Dataframe with timestamp
In [1]: df = pd.DataFrame([100, 150, 155, 180,185, 200], columns=['TimeStamp'])
#List of target times
targetTimes = [120, 130, 180, 187]

The answer should be:

#A list of indices
[0, 1, 3, 4]
#Or something like this
Out[1]: 
   TimeStamp
0    100
1    150
2    180
3    185 

Upvotes: 1

Views: 958

Answers (1)

jezrael
jezrael

Reputation: 863301

Use merge_asof with convert list to one column DataFrame:

df1 = pd.DataFrame(targetTimes, columns=['new'])

df = pd.merge_asof(df1, 
                   df, 
                   left_on='new', 
                   right_on='TimeStamp',
                   direction='nearest')
print (df)
   new  TimeStamp
0  120        100
1  130        150
2  180        180
3  187        185

Upvotes: 2

Related Questions