Chloe D
Chloe D

Reputation: 41

python bar chart not centered

I am trying to build a simple histogram. For some reason, my bars are behaving abnormally. As you can see in this picture, my bar over "3" is moved to the right side. I am not sure what caused it. I did align='mid' but it did not fix it.

This is the code that I used to create it:

def createBarChart(colName):
    df[colName].hist(align='mid')
    plt.title(str(colName))
    RUNS = [1,2,3,4,5]
    plt.xticks(RUNS)
    plt.show()

    for column in colName:
        createBarChart(column)

And this is what I got: bar is not centered over 3

To recreate my data:

df = pd.DataFrame(np.random.randint(1,6,size=(100, 4)), columns=list('ABCD'))

Thank you for your help!

P/s: idk if this info is relevant, but I am using seaborn-whitegrid style. I tried to recreate a plot with sample data and it's still showing up. Is it a bug?

hist created using random data

Upvotes: 4

Views: 1830

Answers (2)

mostlyoxygen
mostlyoxygen

Reputation: 991

The hist function is behaving exactly as it is supposed to. By default it splits the data you pass into 10 bins, with the left edge of the first bin at the data's minimum value and the right edge of the last bin at its maximum. The chart below shows the randomly generated data binned this way, with red dashed lines to mark the edges of the bins.

Histogram showing bin edges

The way around this is to define the bin edges yourself, with a slight adjustment to the minimum and maximum values to centre the bars over the x axis ticks. This can be done quite easily with numpy's linspace function (using column A in the randomly generated data frame as an example):

bins = np.linspace(df["A"].min() - .5, df["A"].max() + .5, 6)
df["A"].hist(bins=bins)

We ask for 6 values because we are defining the bin edges, this will result in 5 bins, as shown in this chart:

Histogram with 5 bins

If you wanted to keep the gaps between the bars you can increase the number of bins to 9 and adjust the offset slightly, but this wouldn't work in all cases (it works here because every value is either 1, 2, 3, 4 or 5).

bins = np.linspace(df["A"].min() - .25, df["A"].max() + .25, 10)
df["A"].hist(bins=bins)

Histogram with empty bins

Finally, as this data contains discrete values and really you are plotting the counts, you could use the value_counts function to create a series that can then be plotted as a bar chart:

df["A"].value_counts().sort_index().plot(kind="bar")
# Provide a 'color' argument if you need all of the bars to look the same.
df["A"].value_counts().sort_index().plot(kind="bar", color="steelblue")

Histogram created with value_counts

Upvotes: 2

Shodmoth
Shodmoth

Reputation: 155

Try using something like this in your code to create all of the histogram bars to the same place.

plt.hist("Your data goes here", bins=range(1,7), align='left', rwidth=1, normed=True)

place your data where I put your data goes here

Upvotes: 0

Related Questions