shah
shah

Reputation: 29

get all rows with latest previous timestamp in python df

I have a df with a datetime64 time column. Given a timestamp t, I'd like to be able to get all rows with time that is closest but before t.

Currently, I create a sorted list of time (with duplicates removed), get the first timestamp from that list, and then select all rows with this timestamp.

Is there a better way (ie. a vectorized operation) to do this?

For example, given the following dataframe and t = 1/1/1980 2:00:59, I want to return pd.DataFrame([Timestamp('1/1/1980 2:00:00'),2],[Timestamp('1/1/1980 2:00:00'),2])

time some_data
1/1/1980 1:00:00 1
1/1/1980 2:00:00 2
1/1/1980 2:00:00 3
1/1/1980 2:01:00 4

Upvotes: 0

Views: 690

Answers (3)

user760900
user760900

Reputation: 606

nearest() and apply() seem to be what you're looking for

>>> from datetime import datetime
>>> import pandas as pd
>>> import numpy as np

>>> df = pd.DataFrame({'time':['1/1/1980 1:00:00','1/1/1980 2:00:00','1/1/1980 2:00:00','1/1/1980 2:01:00'], 'some_data':[1,2,3,4]})
>>> df['time'] = df.time.apply(lambda x:datetime.strptime(x,'%d/%m/%Y %H:%M:%S'))
>>> df
                 time  some_data
0 1980-01-01 01:00:00          1
1 1980-01-01 02:00:00          2
2 1980-01-01 02:00:00          3
3 1980-01-01 02:01:00          4
>>> target = datetime.strptime('1/1/1980 2:00:59', '%d/%m/%Y %H:%M:%S')
>>> target
datetime.datetime(1980, 1, 1, 2, 0, 3)

>>> def nearest(df,t):
...     return df.iloc[df['time'].apply(lambda x: t - x if t > x else np.nan).idxmin()]
>>> nearest(df,target)
time         1980-01-01 02:00:00
some_data                      2
Name: 1, dtype: object

Based off of Daniel Aben-Athar Bemerguy's answer and subsequent comments

Upvotes: 2

I´m not very good with dates but here is a solution that might work for you:

from datetime import datetime
import pandas as pd

df = pd.DataFrame({'time':['1/1/1980 1:00:00','1/1/1980 2:00:00','1/1/1980 2:01:00']})

This will give the following DataFrame

0   1980-01-01 01:00:00
1   1980-01-01 02:00:00
2   1980-01-01 02:01:00
Name: time, dtype: datetime64[ns]
target = datetime.strptime('1/1/1980 2:00:03', '%d/%m/%Y %H:%M:%S')

def nearest(df,t):
    return df.iloc[df.apply(lambda x: abs(x - t)).idxmin()]

df = df.time.apply(lambda x:datetime.strptime(x,'%d/%m/%Y %H:%M:%S'))

nearest(df,target)

OUTPUT

Timestamp('1980-01-01 02:00:00')

Upvotes: 2

Hakan Akgün
Hakan Akgün

Reputation: 927

If I got your question right, this shall anwer your question:

import datetime
latest_row=pd.Timestamp.min
count=0
t = "1/1/1980 2:00:03"
t=datetime.datetime.strptime(t,"%m/%d/%Y %H:%M:%S")
def fetch_latest(df_vals,date):
    global latest_row
    count=1
    for row in df.values:
        row_date=row[0]
        if row_date<date:
            if row_date==latest_row:
                count+=1
            if row_date>latest_row:
                count=1
                latest_row=row_date                
    Latest_rows=[latest_row]*count
    return Latest_rows


fetch_latest(df,t)

enter image description here

Upvotes: 0

Related Questions