Reputation: 919

How to count no. of rows between time intervals(hourly) in pandas?

My data has various columns including a date and a time column. The data is stretched across three months. I need to count no. of rows in a particular hour irrespective of the date. So that would mean getting the count of rows in 00:00 to 01:00 window and similarly for the rest 23 hours. How do I do that? Overall I will have 24 rows with their counts. Here is my data:

>>>df[["date","time"]]

    date    time
0   2006-11-10  00:01:21
1   2006-11-10  00:02:26
2   2006-11-10  00:02:38
3   2006-11-10  00:05:38
4   2006-11-10  00:05:38

Output should be like:

00:00-00:59 SomeCount

Both are object types

Upvotes: 2

Answers (4)

Valdi_Bo

Reputation: 31011

My solution generates row counts for all 24 hours, with 0 for hours "absent" in the source DataFrame.

To show a more instructive example, I defined the source DataFrame containing rows from several hours:

         date      time
0  2006-11-10  01:21:00
1  2006-11-10  02:26:00
2  2006-11-10  02:38:00
3  2006-11-10  05:38:00
4  2006-11-10  05:38:00
5  2006-11-11  05:43:00
6  2006-11-11  05:51:00

Note that last 2 rows are from different date, but as you want grouping by hour only, they will be counted in the same group as previous 2 rows (hour 5).

The first step is to create a Series containing almost what you want:

wrk = df.groupby(pd.to_datetime(df.time).dt.hour).apply(
    lambda grp: grp.index.size).reindex(range(24), fill_value=0)

The initial part of wrk is:

The left column (the index) contains hour as an integer and the right column is the count - how many rows are in this hour.

The only thing to do is to reformat the index to your desired format:

wrk.index = wrk.index.map(lambda h: f'{h:02}:00-{h:02}:59')

The result (initial part only) is:

time
00:00-00:59    0
01:00-01:59    1
02:00-02:59    2
03:00-03:59    0
04:00-04:59    0
05:00-05:59    4
06:00-06:59    0
07:00-07:59    0

But if you want to get counts only for hours present in your source data, then drop .reindex(…) from the above code.

Then your (full) result, for the above DataFrame will be:

time
01:00-01:59    1
02:00-02:59    2
05:00-05:59    4
dtype: int64

Upvotes: 1

jezrael

Reputation: 863741

I think simpliest is convert both columns to datetimes and count hours by Series.dt.hour with Series.value_counts:

out = pd.to_datetime(df["date"] + ' ' + df["time"]).dt.hour.value_counts().sort_index()

Or if need your format use Series.dt.strftime with GroupBy.size:

s = pd.to_datetime(df["date"] + ' ' + df["time"]).dt.strftime('%H:00-%H:59')
print (s)
0    00:00-00:59
1    00:00-00:59
2    00:00-00:59
3    00:00-00:59
4    00:00-00:59
dtype: object

out = s.groupby(s, sort=False).size()
print (out)

00:00-00:59    5
dtype: int64

Last for DataFrame use:

df = out.rename_axis('times').reset_index(name='count')

Upvotes: 1

Rob Raymond

Reputation: 31236

groupby() the hour
then build DF that has values you want
cleanup indexes and column names

import io
df = pd.read_csv(io.StringIO("""    date    time
0   2006-11-10  00:01:21
1   2006-11-10  00:02:26
2   2006-11-10  00:02:38
3   2006-11-10  00:05:38
4   2006-11-10  02:05:38"""), sep="\s\s+", engine="python")

dfc = (df.groupby(pd.to_datetime(df.time).dt.hour)
 .apply(lambda d: pd.DataFrame({"count":[len(d)]}, 
                               index=[pd.to_datetime(d["time"]).min().strftime("%H:%M")
                                      +"-"+pd.to_datetime(d["time"]).max().strftime("%H:%M")]))
 .reset_index()
 .drop(columns=["time"])
 .rename(columns={"level_1":"time"})
)

	time	count
0	00:01-00:05	4
1	02:05-02:05	1

Upvotes: 1

rupinderg00

Reputation: 189

You can split the time string with the delimiter :. Then create an another column hour for hour. Then use groupby() to group them on the basis of new column hour. You can now store the data in a new series or dataframe to get the desired output

Upvotes: 1

How to count no. of rows between time intervals(hourly) in pandas?

Answers (4)

Related Questions