How to plot a time intervals histogram using pandas?

Question

Given a dataframe of intervals ('start time' and 'end time'), what would be an efficiant way to generate a histogram that show for each time point T, how many intervals are 'active' in it?

Code to generate a sample dataframe (apologies, I am sure there is a better way to do this):

from random import randint

intervals = []

for i in range(50):
    start = randint(0,50)
    intervals.append({'start':start, 'end':start+randint(0,50)})

intervals_df = pd.DataFrame(intervals)

An example:

For the following dataframe:

import pandas as pd
pd.DataFrame([{'start':2,'end':5},{'start':3,'end':8},{'start':9,'end':10},{'start':4,'end':5}])

The matching graph would be similar to:

My intuition is that the time intervals need to be broken somehow to discrete values so they could be binned, but how?

chrisb · Accepted Answer

You could use apply to generate the values for each range, then melt to reshape the data into long form.

In [113]: expanded = df.apply(lambda row: pd.Series(np.arange(row['start'], row['end'] + 1)), axis=1)

In [114]: expanded
Out[114]: 
   0   1   2   3   4   5
0  2   3   4   5 NaN NaN
1  3   4   5   6   7   8
2  9  10 NaN NaN NaN NaN
3  4   5 NaN NaN NaN NaN

In [115]: expanded = pd.melt(expanded)['value'].dropna()

In [116]: expanded
Out[116]: 
0      2
1      3
2      9
3      4
4      3
5      4
6     10
7      5
8      4
9      5
12     5
13     6
17     7
21     8
Name: value, dtype: float64

From there you could use the built-in histogram plot, or some other kind of binning.

In [117]: expanded.hist()

How to plot a time intervals histogram using pandas?

Answers (2)

Related Questions