mesh
mesh

Reputation: 937

Rows from pandas dataframe between two time values

I have a pandas dataframe like this (with timestamp converted to datetime object):

    id  timestamp
0   221 2020-11-07 12:02:00
1   223 2020-11-08 13:21:00
2   224 2020-11-09 12:50:00
3   225 2020-11-10 14:23:00
4   226 2020-11-11 12:25:00
5   227 2020-11-14 14:26:00

I want to find number of rows between a time interval. e.g 12:00-13:00, here it would be 3 (entries 0, 2 and 4)

Upvotes: 2

Views: 2011

Answers (4)

JvdV
JvdV

Reputation: 75840

So as I commented, I think you can just use between_time function:

CountRows = df.set_index('timestamp').between_time('12:00','13:00').shape[0]

This, in your case, sets column timestamp as index and then returns the count of rows between the two timevalues. Where:

"The first element of the tuple returned by Dataframe.shape contains the number of items in index in a dataframe i.e. basically the number of rows in the dataframe. Source

Upvotes: 3

wwnde
wwnde

Reputation: 26676

Please try

Coerce timestamp to datetime and sort ascending

  df['timestamp']=pd.to_datetime(df['timestamp']).sort_values(ascending=True)

Reset index but do not drop to keep id

df.reset_index(drop=False, inplace=True)

Set timestamp as new index to allow use of df.betweeen time

df.set_index(df['timestamp'], inplace=True)
df.between_time('12:00', '13:00')

Upvotes: 1

mesh
mesh

Reputation: 937

As suggested by posts in Comments section between_time works well. One need to make timestamp (which is a datetime64 object here) a index first, then use the between_time function.

Upvotes: 0

Suthiro
Suthiro

Reputation: 1290

Since I believe that suggested between_time works only for DatetimeIndex, you can either set your DataFrame index to 'timestamp' and then use between_time or, alternatively, first, use sorting

df.sort_values(by='timestamp',axis='columns',inplace=True)

Then use sorted search

start = df['timestamp'].searchsorted(pd.Timestamp('2020-11-07 12:00:00'), side='left')
end = df['timestamp'].searchsorted(pd.Timestamp('2020-11-07 13:00:00'), side='right')

Then find number of rows

count = start - end

Upvotes: 1

Related Questions