Reputation: 319
I have a Pandas column with data unique to .0001
I would like to plot a histogram that has a bar for each unique .0001 of data.
I achieve a lot of granularity by
plt.hist(df['data'], bins=500)
but I would like to see counts for each unique value.
How would I go about doing this? thank you
Upvotes: 0
Views: 4128
Reputation: 22449
Wouldn't be easier to use a Counter
?
from collections import Counter
cntr = Counter(df['data'])
plt.bar(cntr.keys(),cntr.values())
In this way you don't have to specify the bin width a priori.
Upvotes: 1
Reputation: 80289
As your values are discrete, it is important to set the bin boundaries nicely in-between these values. If the boundaries coincide with the values, strange rounding artifacts can happen. The example below has each value 10 times, but the histogram with the boundaries on top of the values puts the last two values into the same bin:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame({'data': np.repeat(np.arange(0.0005, 0.0030, 0.0001), 10)})
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(15, 4))
ax1.hist(df['data'], bins=np.arange(df['data'].min(), df['data'].max(), 0.0001), ec='w')
ax1.set_title('bin boundaries on top of the values')
ax2.hist(df['data'], bins=np.arange(df['data'].min() - 0.00005, df['data'].max() + 0.0001, 0.0001), ec='w')
ax2.set_title('bin boundaries in-between the values')
plt.show()
Note that the version with the boundaries at the halves also puts the x-ticks nicely in the center of the bins.
Upvotes: 2
Reputation: 150735
Instead of specify the number of bins bins=500
, you can specify the bins:
plt.hist(df['data'], bins=np.arange(df['data'].min(), df['data'].max(), 0.0001) )
Upvotes: 0