Bryan Fok
Bryan Fok

Reputation: 3487

Pandas DataFrame How to query the closest datetime index?

How do i query for the closest index from a Pandas DataFrame? The index is DatetimeIndex

2016-11-13 20:00:10.617989120   7.0 132.0
2016-11-13 22:00:00.022737152   1.0 128.0
2016-11-13 22:00:28.417561344   1.0 132.0

I tried this:

df.index.get_loc(df.index[0], method='nearest')

but it give me InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Same error if I tried this:

dt = datetime.datetime.strptime("2016-11-13 22:01:25", "%Y-%m-%d %H:%M:%S")
df.index.get_loc(dt, method='nearest')

But if I remove method='nearest' it works, but that is not I want, I want to find the closest index from my query datetime

Upvotes: 61

Views: 68430

Answers (5)

Anaderi
Anaderi

Reputation: 791

in case you are interested in an easy way getting nearest value from a DateTime-indexed DataFrame use Index.asof method:

print(dt)
2016-11-13 22:00:25.450000

dt_indexed = df.index.asof(dt)
print(dt_indexed)
2016-11-13 22:00:28.417561344

Upvotes: 2

HarryChil
HarryChil

Reputation: 559

DatetimeIndex.get_loc is now deprecated in favour of DatetimeIndex.get_indexer...

ts = pd.to_datetime('2022-05-26 13:19:48.154000')        # example time
iloc_idx = df.index.get_indexer([ts], method='nearest')  # returns absolute index into df e.g. array([5])
loc_idx = df.index[iloc_idx]                             # if you want named index

my_val = df.iloc[iloc_idx]
my_val = df.loc[loc_idx]                                 # as above so below...

Upvotes: 38

Jakob
Jakob

Reputation: 246

I know it's an old question, but while searching for the same problems as Bryan Fok, I landed here. So for future searchers getting here, I post my solution. My index had 4 non-unique items (possibly due to rounding errors when recording the data). The following worked and showed the correct data:

dt = pd.to_datetime("2016-11-13 22:01:25.450")

s = df.loc[df.index.unique()[df.index.unique().get_loc(dt, method='nearest')]]

However, in case your nearest index occures multiple times, this will return multiple rows. If you want to catch that, you could test for it with:

if len(s) != len(df.columns):
    # do what is appropriate for your case
    # e.g. selecting only the first occurence
    s.iloc[0]

Edit: fixed the catching after some test

Upvotes: 1

Bryan Fok
Bryan Fok

Reputation: 3487

I believe jezrael solution works, but not on my dataframe (which i have no clue why). This is the solution that I came up with.

from bisect import bisect #operate as sorted container
timestamps = np.array(df.index)
upper_index = bisect(timestamps, np_dt64, hi=len(timestamps)-1) #find the upper index of the closest time stamp
df_index = df.index.get_loc(min(timestamps[upper_index], timestamps[upper_index-1],key=lambda x: abs(x - np_dt64))) #find the closest between upper and lower timestamp

Upvotes: 3

jezrael
jezrael

Reputation: 862431

It seems you need first get position by get_loc and then select by []:

dt = pd.to_datetime("2016-11-13 22:01:25.450")
print (dt)
2016-11-13 22:01:25.450000

print (df.index.get_loc(dt, method='nearest'))
2

idx = df.index[df.index.get_loc(dt, method='nearest')]
print (idx)
2016-11-13 22:00:28.417561344
#if need select row to Series use iloc
s = df.iloc[df.index.get_loc(dt, method='nearest')]
print (s)
b      1.0
c    132.0
Name: 2016-11-13 22:00:28.417561344, dtype: float64

Upvotes: 77

Related Questions