Neuneck
Neuneck

Reputation: 334

How to count the number of rows in a given time interval in python pandas?

I have a pandas dataframe with a numer of columns that contain timestamps for certain events that can happen to objects, where the object IDs index the rows.

obj_id | event1  |  event2  |  event3  |  ...
1      | datetime| datetime |  NaT     |  ...
...    | ...     | ...      |  ...     |  ...

I want to count the number of occurences of an event over the course of the day (discarding the date), in intervals I specify.

Sor far, I solve this by recunstructing the number of minutes since midnight using datetime.hour and datetime.minute:

i = 5    # number of minutes in the interval I'm interested in
ev1_counts = df.groupby(
                        df.event1.apply(lambda x: i * ((60*x.hour + x.minute)//i))
                        )['event1'].count()

This does the job, but it seems unpythonic and I'm sure there is a better way. But how?

I have seen this question, but trying

time_series = pd.DatetimeIndex(df.event1)
ts_df =  pd.Series([1]*len(time_series), index=time_series)
ev1_counts = ts_df.groupby(pd.TimeGrouper(freq = '{:d}Min'.format(i)).count()

Keeps the date information, which I want to discard. Converting the pd.datetime objects with the .time() method seems problematic, since the result can not be treated as a datetime object.

Upvotes: 1

Views: 2003

Answers (1)

jezrael
jezrael

Reputation: 863741

It seems you can omit apply and simplify solution to:

ev1_counts = df.groupby((60*df.event1.dt.hour+df.event1.dt.minute)//i * i)['event1'].count()

Upvotes: 1

Related Questions