Reputation: 41
I am trying to build a simple histogram. For some reason, my bars are behaving abnormally. As you can see in this picture, my bar over "3" is moved to the right side. I am not sure what caused it. I did align='mid' but it did not fix it.
This is the code that I used to create it:
def createBarChart(colName):
df[colName].hist(align='mid')
plt.title(str(colName))
RUNS = [1,2,3,4,5]
plt.xticks(RUNS)
plt.show()
for column in colName:
createBarChart(column)
And this is what I got: bar is not centered over 3
To recreate my data:
df = pd.DataFrame(np.random.randint(1,6,size=(100, 4)), columns=list('ABCD'))
Thank you for your help!
P/s: idk if this info is relevant, but I am using seaborn-whitegrid style. I tried to recreate a plot with sample data and it's still showing up. Is it a bug?
hist created using random data
Upvotes: 4
Views: 1830
Reputation: 991
The hist
function is behaving exactly as it is supposed to. By default it splits the data you pass into 10 bins, with the left edge of the first bin at the data's minimum value and the right edge of the last bin at its maximum. The chart below shows the randomly generated data binned this way, with red dashed lines to mark the edges of the bins.
The way around this is to define the bin edges yourself, with a slight adjustment to the minimum and maximum values to centre the bars over the x axis ticks. This can be done quite easily with numpy's linspace
function (using column A in the randomly generated data frame as an example):
bins = np.linspace(df["A"].min() - .5, df["A"].max() + .5, 6)
df["A"].hist(bins=bins)
We ask for 6 values because we are defining the bin edges, this will result in 5 bins, as shown in this chart:
If you wanted to keep the gaps between the bars you can increase the number of bins to 9 and adjust the offset slightly, but this wouldn't work in all cases (it works here because every value is either 1, 2, 3, 4 or 5).
bins = np.linspace(df["A"].min() - .25, df["A"].max() + .25, 10)
df["A"].hist(bins=bins)
Finally, as this data contains discrete values and really you are plotting the counts, you could use the value_counts
function to create a series that can then be plotted as a bar chart:
df["A"].value_counts().sort_index().plot(kind="bar")
# Provide a 'color' argument if you need all of the bars to look the same.
df["A"].value_counts().sort_index().plot(kind="bar", color="steelblue")
Upvotes: 2
Reputation: 155
Try using something like this in your code to create all of the histogram bars to the same place.
plt.hist("Your data goes here", bins=range(1,7), align='left', rwidth=1, normed=True)
place your data where I put your data goes here
Upvotes: 0