Reputation: 25
So I am trying to find a vectorized way to do this, say I have a data frame DF1 that contains a timestamp column with a lot of timestamps in increasing order. I also have a list with a target time and I want to use this to find the row or index of that row in DF1 that contains the closest timestamp to that target time. I have a solution that uses iterrows but I was wondering if there is a way to do it without iterrows as this might take too long.
The number of rows in the data frame are always bigger than the list of target time values. If the target time is in between two time stamps then always pick the smallest time stamp.
#Dataframe with timestamp
In [1]: df = pd.DataFrame([100, 150, 155, 180,185, 200], columns=['TimeStamp'])
#List of target times
targetTimes = [120, 130, 180, 187]
The answer should be:
#A list of indices
[0, 1, 3, 4]
#Or something like this
Out[1]:
TimeStamp
0 100
1 150
2 180
3 185
Upvotes: 1
Views: 958
Reputation: 863301
Use merge_asof
with convert list to one column DataFrame
:
df1 = pd.DataFrame(targetTimes, columns=['new'])
df = pd.merge_asof(df1,
df,
left_on='new',
right_on='TimeStamp',
direction='nearest')
print (df)
new TimeStamp
0 120 100
1 130 150
2 180 180
3 187 185
Upvotes: 2