Souparno
Souparno

Reputation: 3

Colours overlapping in bar graph

I'm trying to use different colours for representing different categories of species in the bar graph in iris dataset but all the colours are getting overlapped on every bar of the graph The bar graph

I used this code

fig,axes=plt.subplots(2,2,figsize=(8,8))
c=['red','green','orange']
axes[0,0].set_title('Sepal Length')
axes[0,0].bar(df['Species'],df['SepalLengthCm'],color=c)
axes[0,1].set_title('Sepal Width')
axes[0,1].bar(df['Species'],df['SepalWidthCm'],color=c)
axes[1,0].set_title('Petal Length')
axes[1,0].bar(df['Species'],df['PetalLengthCm'],color=c)
axes[1,1].set_title('Petal Width')
axes[1,1].bar(df['Species'],df['PetalWidthCm'],color=c)
plt.show()

Upvotes: 0

Views: 91

Answers (2)

JohanC
JohanC

Reputation: 80409

The following tries to explain what's going on in your original code. To avoid overcrowding, it uses 30 random rows of the iris dataset.

df['species'] is the species column, containing values ['versicolor', 'setosa', 'setosa', 'virginica', 'versicolor', 'versicolor', 'setosa', 'virginica', 'setosa', 'virginica', ...].

df['sepal_length'] contains [6.4, 5.5, 4.7, 6.7, 5.8, 5.6, 4.8, 7.1, 5.8, 6.7, ...].

Then ax.bar(df['species'], df['sepal_length'], color=c) will create 30 (in this example) bars. One for the first of df['species'], versicolor with height 6.4. Then one for setosa with height 5.5. Then again setosa height 4.7. As the same x-value is used, this bar will be drawn on top of the other one. The colors won't correspond to the species, they are just the 3 colors repeated for each subsequent row.

It's easier to see using the index to position the bars, and then to imagine all these bars superimposed depending on the species.

import seaborn as sns # easy way to get the iris dataset
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(20230210)
fig, axs = plt.subplots(ncols=2, squeeze=False, figsize=(15, 4), gridspec_kw={'width_ratios': [10, 1]})
iris = sns.load_dataset('iris')
df = iris.iloc[np.random.choice(len(iris), replace=False, size=30)].reset_index()
c = ['red', 'green', 'orange']
axs[0, 0].bar(df.index, df['sepal_length'], color=c)
axs[0, 0].margins(x=0.02)
axs[0, 0].set_xticks(df.index, df['species'], rotation=30)
axs[0, 0].set_title('The separate bars drawn')
axs[0, 1].bar(df['species'], df['sepal_length'], color=c)
axs[0, 1].tick_params(axis='x', rotation=30)
axs[0, 1].set_title('Bars superimposed\nper species')

plt.tight_layout()
plt.show()

why bars are superimposed in bar plot

Upvotes: 1

Matt Pitkin
Matt Pitkin

Reputation: 6417

As suggested by @JohanC the comments, I'd recommend using a Seaborn barplot for this, but if you just want to use Matplotlib you could do the following (note, this assumes that you want the mean value plotted in each case):

fig, axes = plt.subplots(2, 2, figsize=(8, 8))
c = ['red', 'green', 'orange']

cols = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']

for ax, col in zip(axes.flatten(), cols):
    ax.bar(
        df["Species"].unique(),
        df.groupby("Species")[col].mean(),  # get the mean values
        color=c
    )
    ax.set_title(col)

The Seaborn barplot by default would calculate the mean and add an error bar based on the standard deviation of the data. This could be done in Matplotlib with:

for ax, col in zip(axes.flatten(), cols):
    ax.bar(
        df["Species"].unique(),
        df.groupby("Species")[col].mean(),  # get the mean values
        yerr=df.groupby("Species")[col].std(), # add error
        color=c,
    )
    ax.set_title(col)

Upvotes: 0

Related Questions