Ibrahim Sherif
Ibrahim Sherif

Reputation: 546

Aggregate and count number of continuous time intervals in pandas

I want to aggregate the continuous time intervals and count them for each person in pandas.

As an example I have dataframe that looks like this:

     Name      Time
0     Bob  10:59:00
1     Bob  11:00:00
2     Bob  11:01:00
3     Bob  11:14:00
4     Bob  11:15:00
5     Bob  11:17:00
6     Bob  11:18:00
7     Bob  11:19:00
8     Bob  13:10:00
9     Bob  13:10:00
10    Bob  13:15:00
11    Bob  13:16:00
12    Bob  13:17:00
13    Bob  13:18:00
14    Bob  13:19:00
15  Alice  13:17:00
16  Alice  13:18:00
17  Alice  13:19:00

With the following output:

Note: I only need the count not the time intervals themselves.

Bob = 5
1. 10:59 - 11:01
2. 11:14 - 11:15
3. 11:17 - 11:19
4. 13:10 - 13:10
5. 13:15 - 13:19

Alice = 1
1. 13:17 - 13:19

Upvotes: 0

Views: 456

Answers (1)

Valdi_Bo
Valdi_Bo

Reputation: 30971

I assume that your Time column is of string type.

You can use the following code:

result = df.groupby('Name', sort=False).apply(
    lambda grp: grp.groupby((pd.to_datetime(grp.Time).diff().dt
    .total_seconds() / 60).fillna(2).ge(2).cumsum())
    .Time.apply(lambda tt: f'{tt.iloc[0]} - {tt.iloc[-1]}'))\
    .rename_axis(['Name', 'Period'])

The result is:

Name   Period
Bob    1         10:59:00 - 11:01:00
       2         11:14:00 - 11:15:00
       3         11:17:00 - 11:19:00
       4         13:10:00 - 13:10:00
       5         13:15:00 - 13:19:00
Alice  1         13:17:00 - 13:19:00
Name: Time, dtype: object

The above code contains 2-level grouping:

  • First level - by Name,
  • Second level - by the number of a "period" - a sequence of consecutive time values.

From each 2nd level group there is taken the first and last time (separated with a minus).

The last part is to rename columns in the index, to better reflect the meaning of the data displayed.

If you want a printout looking almost exactly as in our post, run:

for key, grp in result.groupby(level=0, sort=False):
    print(f'{key} = {grp.size}')
    print(f'{grp.reset_index(level=0, drop=True).rename_axis(None).to_string()}\n')

(try yourself to see the result).

This time there are no dots after the period number, but in my opinion it is not important.

Upvotes: 1

Related Questions