Reputation: 29
I have a df with a datetime64 time
column. Given a timestamp t
, I'd like to be able to get all rows with time
that is closest but before t
.
Currently, I create a sorted list of time
(with duplicates removed), get the first timestamp from that list, and then select all rows with this timestamp.
Is there a better way (ie. a vectorized operation) to do this?
For example, given the following dataframe and t = 1/1/1980 2:00:59
, I want to return pd.DataFrame([Timestamp('1/1/1980 2:00:00'),2],[Timestamp('1/1/1980 2:00:00'),2])
time | some_data |
---|---|
1/1/1980 1:00:00 | 1 |
1/1/1980 2:00:00 | 2 |
1/1/1980 2:00:00 | 3 |
1/1/1980 2:01:00 | 4 |
Upvotes: 0
Views: 690
Reputation: 606
nearest()
and apply()
seem to be what you're looking for
>>> from datetime import datetime
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({'time':['1/1/1980 1:00:00','1/1/1980 2:00:00','1/1/1980 2:00:00','1/1/1980 2:01:00'], 'some_data':[1,2,3,4]})
>>> df['time'] = df.time.apply(lambda x:datetime.strptime(x,'%d/%m/%Y %H:%M:%S'))
>>> df
time some_data
0 1980-01-01 01:00:00 1
1 1980-01-01 02:00:00 2
2 1980-01-01 02:00:00 3
3 1980-01-01 02:01:00 4
>>> target = datetime.strptime('1/1/1980 2:00:59', '%d/%m/%Y %H:%M:%S')
>>> target
datetime.datetime(1980, 1, 1, 2, 0, 3)
>>> def nearest(df,t):
... return df.iloc[df['time'].apply(lambda x: t - x if t > x else np.nan).idxmin()]
>>> nearest(df,target)
time 1980-01-01 02:00:00
some_data 2
Name: 1, dtype: object
Based off of Daniel Aben-Athar Bemerguy's answer and subsequent comments
Upvotes: 2
Reputation: 515
I´m not very good with dates but here is a solution that might work for you:
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'time':['1/1/1980 1:00:00','1/1/1980 2:00:00','1/1/1980 2:01:00']})
This will give the following DataFrame
0 1980-01-01 01:00:00
1 1980-01-01 02:00:00
2 1980-01-01 02:01:00
Name: time, dtype: datetime64[ns]
target = datetime.strptime('1/1/1980 2:00:03', '%d/%m/%Y %H:%M:%S')
def nearest(df,t):
return df.iloc[df.apply(lambda x: abs(x - t)).idxmin()]
df = df.time.apply(lambda x:datetime.strptime(x,'%d/%m/%Y %H:%M:%S'))
nearest(df,target)
OUTPUT
Timestamp('1980-01-01 02:00:00')
Upvotes: 2
Reputation: 927
If I got your question right, this shall anwer your question:
import datetime
latest_row=pd.Timestamp.min
count=0
t = "1/1/1980 2:00:03"
t=datetime.datetime.strptime(t,"%m/%d/%Y %H:%M:%S")
def fetch_latest(df_vals,date):
global latest_row
count=1
for row in df.values:
row_date=row[0]
if row_date<date:
if row_date==latest_row:
count+=1
if row_date>latest_row:
count=1
latest_row=row_date
Latest_rows=[latest_row]*count
return Latest_rows
fetch_latest(df,t)
Upvotes: 0