Reputation: 3744
Say I have the following data:
import pandas as pd
data = {'time':[7, 1, 2, 7, 2, 2, 1, 2, 7, 3, 5], 'event':['a', 'b', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'b', 'a']}
df = pd.DataFrame(data)
I want to display how many events of each type occurred at each hour of the day. However, there are only 5 unique times present in the "time" column of the dataset.
Plotting a histogram with bins=24
works when all the 24 unique hours of the day (1 to 24) are present in the dataset. But if only a few hours of the day are present, histogram doesn't do this task.
For example, with the above data, the code df.hist()
produces this chart:
It is unclear where the x-axis ticks are located exactly - what I want is, that the 5 spikes in this chart should be located at x = 1, 2, 3, 5 and 7, and there should be no spikes present at x = 4, 6 and 8 through 24.
With df.time.hist(bins=24)
, the following chart is produced:
Here, it is a bit better as we can see that at least the first 4 spikes are located at x = 1, 2, 3, and 5, with x = 4 and x = 6 being left blank. However, at x=7, the spike is drawn to the left of the grid lines, while the other 4 spikes are drawn to the right of the grid lines. Also, this doesn't display the empty spikes at x = 8 through 24.
So, how do I do it?
Upvotes: 0
Views: 462
Reputation: 40481
Try this:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = {'time':[7, 1, 2, 7, 2, 2, 1, 2, 7, 3, 5], 'event':['a', 'b', 'a', 'a', 'b', 'a', 'a', 'b', 'b', 'b', 'a']}
df = pd.DataFrame(data)
fig, axes = plt.subplots(nrows=1, ncols=1, figsize=(16, 10))
df.hist(ax=axes, bins=range(24))
# offset the xticks
axes.set_xticks(np.arange(24) + .5)
# name the label accordingly
axes.set_xticklabels(range(24))
Upvotes: 2