user2808117
user2808117

Reputation: 4897

How to plot a time intervals histogram using pandas?

Given a dataframe of intervals ('start time' and 'end time'), what would be an efficiant way to generate a histogram that show for each time point T, how many intervals are 'active' in it?

Code to generate a sample dataframe (apologies, I am sure there is a better way to do this):

from random import randint

intervals = []

for i in range(50):
    start = randint(0,50)
    intervals.append({'start':start, 'end':start+randint(0,50)})

intervals_df = pd.DataFrame(intervals)

An example:

For the following dataframe:

import pandas as pd
pd.DataFrame([{'start':2,'end':5},{'start':3,'end':8},{'start':9,'end':10},{'start':4,'end':5}])

The matching graph would be similar to:

enter image description here

My intuition is that the time intervals need to be broken somehow to discrete values so they could be binned, but how?

Upvotes: 1

Views: 1551

Answers (2)

JoelleFV
JoelleFV

Reputation: 21

chrisb's answer is great, but the reason why you have a gap between 5 and 6 is because the default number of bins is 10 by default. This means that the bins are distant by 0.8 in your case, so the gap is actually between 5.2 and 6.

A way to counteract this is to manually set the number of bins equal to the length of your data:

nb_bins = int(max(expanded) - min(expanded))
expanded.hist(bins = nb_bins)

Result: enter image description here 1

Upvotes: 1

chrisb
chrisb

Reputation: 52286

You could use apply to generate the values for each range, then melt to reshape the data into long form.

In [113]: expanded = df.apply(lambda row: pd.Series(np.arange(row['start'], row['end'] + 1)), axis=1)

In [114]: expanded
Out[114]: 
   0   1   2   3   4   5
0  2   3   4   5 NaN NaN
1  3   4   5   6   7   8
2  9  10 NaN NaN NaN NaN
3  4   5 NaN NaN NaN NaN

In [115]: expanded = pd.melt(expanded)['value'].dropna()

In [116]: expanded
Out[116]: 
0      2
1      3
2      9
3      4
4      3
5      4
6     10
7      5
8      4
9      5
12     5
13     6
17     7
21     8
Name: value, dtype: float64

From there you could use the built-in histogram plot, or some other kind of binning.

In [117]: expanded.hist()

hist

Upvotes: 3

Related Questions