Reputation: 3
I'm trying to use different colours for representing different categories of species in the bar graph in iris dataset but all the colours are getting overlapped on every bar of the graph The bar graph
I used this code
fig,axes=plt.subplots(2,2,figsize=(8,8))
c=['red','green','orange']
axes[0,0].set_title('Sepal Length')
axes[0,0].bar(df['Species'],df['SepalLengthCm'],color=c)
axes[0,1].set_title('Sepal Width')
axes[0,1].bar(df['Species'],df['SepalWidthCm'],color=c)
axes[1,0].set_title('Petal Length')
axes[1,0].bar(df['Species'],df['PetalLengthCm'],color=c)
axes[1,1].set_title('Petal Width')
axes[1,1].bar(df['Species'],df['PetalWidthCm'],color=c)
plt.show()
Upvotes: 0
Views: 91
Reputation: 80409
The following tries to explain what's going on in your original code. To avoid overcrowding, it uses 30 random rows of the iris dataset.
df['species']
is the species column, containing values ['versicolor', 'setosa', 'setosa', 'virginica', 'versicolor', 'versicolor', 'setosa', 'virginica', 'setosa', 'virginica', ...]
.
df['sepal_length']
contains [6.4, 5.5, 4.7, 6.7, 5.8, 5.6, 4.8, 7.1, 5.8, 6.7, ...]
.
Then ax.bar(df['species'], df['sepal_length'], color=c)
will create 30 (in this example) bars. One for the first of df['species']
, versicolor
with height 6.4
. Then one for setosa
with height 5.5
. Then again setosa
height 4.7
. As the same x-value is used, this bar will be drawn on top of the other one. The colors won't correspond to the species, they are just the 3 colors repeated for each subsequent row.
It's easier to see using the index to position the bars, and then to imagine all these bars superimposed depending on the species.
import seaborn as sns # easy way to get the iris dataset
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(20230210)
fig, axs = plt.subplots(ncols=2, squeeze=False, figsize=(15, 4), gridspec_kw={'width_ratios': [10, 1]})
iris = sns.load_dataset('iris')
df = iris.iloc[np.random.choice(len(iris), replace=False, size=30)].reset_index()
c = ['red', 'green', 'orange']
axs[0, 0].bar(df.index, df['sepal_length'], color=c)
axs[0, 0].margins(x=0.02)
axs[0, 0].set_xticks(df.index, df['species'], rotation=30)
axs[0, 0].set_title('The separate bars drawn')
axs[0, 1].bar(df['species'], df['sepal_length'], color=c)
axs[0, 1].tick_params(axis='x', rotation=30)
axs[0, 1].set_title('Bars superimposed\nper species')
plt.tight_layout()
plt.show()
Upvotes: 1
Reputation: 6417
As suggested by @JohanC the comments, I'd recommend using a Seaborn barplot
for this, but if you just want to use Matplotlib you could do the following (note, this assumes that you want the mean value plotted in each case):
fig, axes = plt.subplots(2, 2, figsize=(8, 8))
c = ['red', 'green', 'orange']
cols = ['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']
for ax, col in zip(axes.flatten(), cols):
ax.bar(
df["Species"].unique(),
df.groupby("Species")[col].mean(), # get the mean values
color=c
)
ax.set_title(col)
The Seaborn barplot
by default would calculate the mean and add an error bar based on the standard deviation of the data. This could be done in Matplotlib with:
for ax, col in zip(axes.flatten(), cols):
ax.bar(
df["Species"].unique(),
df.groupby("Species")[col].mean(), # get the mean values
yerr=df.groupby("Species")[col].std(), # add error
color=c,
)
ax.set_title(col)
Upvotes: 0