How to plot number of events occurring at each hour of the day in a pandas dataframe?

Question

Say I have the following data:

import pandas as pd
data = {'time':[7, 1, 2, 7, 2, 2, 1, 2, 7, 3, 5], 'event':['a', 'b', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'b', 'a']}
df = pd.DataFrame(data)

I want to display how many events of each type occurred at each hour of the day. However, there are only 5 unique times present in the "time" column of the dataset.

Plotting a histogram with bins=24 works when all the 24 unique hours of the day (1 to 24) are present in the dataset. But if only a few hours of the day are present, histogram doesn't do this task.

For example, with the above data, the code df.hist() produces this chart:

It is unclear where the x-axis ticks are located exactly - what I want is, that the 5 spikes in this chart should be located at x = 1, 2, 3, 5 and 7, and there should be no spikes present at x = 4, 6 and 8 through 24.

With df.time.hist(bins=24), the following chart is produced:

Here, it is a bit better as we can see that at least the first 4 spikes are located at x = 1, 2, 3, and 5, with x = 4 and x = 6 being left blank. However, at x=7, the spike is drawn to the left of the grid lines, while the other 4 spikes are drawn to the right of the grid lines. Also, this doesn't display the empty spikes at x = 8 through 24.

So, how do I do it?

sagi · Accepted Answer

Try this:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = {'time':[7, 1, 2, 7, 2, 2, 1, 2, 7, 3, 5], 'event':['a', 'b', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'b', 'a']}
df = pd.DataFrame(data)
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(16, 10))

df.hist(ax=axes, bins=range(24))

# offset the xticks
axes.set_xticks(np.arange(24) + .5)

# name the label accordingly
axes.set_xticklabels(range(24))

How to plot number of events occurring at each hour of the day in a pandas dataframe?

Answers (1)

Related Questions