Simon
Simon

Reputation: 10150

matplotlib categorical bar chart creates unwanted whitespace

I have a dataframe that looks like this:

import numpy as np
import pandas as pd

location = list(range(1, 34))
location += [102, 172]
stress = np.random.randint(1,1000, len(location))
group = np.random.choice(['A', 'B'], len(location))

df = pd.DataFrame({'location':location, 'stress':stress, 'group':group})
df[['location', 'group']] = df[['location', 'group']].astype(str)

Note: location and group are both strings

I'm trying to create a a bar plot so that location (categorical) is on the x axis, and stress is the height of each bar. Furthermore, I want to color each bar with a different colour for each group

I've tried the following:

f, axarr = plt.subplots(1, 1)
axarr.bar(df['location'], df['stress'])
plt.xticks(np.arange(df.shape[0]) + 1, df['location'])
plt.show()

However, this produces:

enter image description here

I'm not sure why there are blank spaces between the end bars. I'm guessing its because of the 102 and 172 values in location, however, that column is a string so I'm expecting it to be treated as a categorical variable, with all bars placed next to each other regardless of location "value". I tried to correct for this by manually specifying the xtick location and labels but it didn't seem to work

Finally, is there a quick way to colour each bar by group without having to manually iterate over each unique group value?

Upvotes: 1

Views: 520

Answers (1)

Y. Luo
Y. Luo

Reputation: 5722

If your location is categorical data, don't make your bar plot with that. Use np.arange(df.shape[0]) to make the bar plot and set ticklabels later:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

location = list(range(1, 34))
location += [102, 172]
stress = np.random.randint(1,1000, len(location))
group = np.random.choice(['A', 'B'], len(location))

df = pd.DataFrame({'location':location, 'stress':stress, 'group':group})
df[['location', 'group']] = df[['location', 'group']].astype(str)
f, axarr = plt.subplots(1, 1)
bars = axarr.bar(np.arange(df.shape[0]), df['stress'])
for b, g in zip(bars.patches, df['group']):
    if g == 'A':
        b.set_color('b')
    elif g == 'B':
        b.set_color('r')
plt.xticks(np.arange(df.shape[0]) + bars.patches[0].get_width() / 2, df['location'])
plt.setp(axarr.xaxis.get_ticklabels(), rotation=90)
plt.show()

Don't know if there is a concise way to set bar color in bulk. An iteration is not too bad... enter image description here

Upvotes: 1

Related Questions