avirr
avirr

Reputation: 668

Bar plot with irregular spacing

I am using a bar chart to plot query frequencies, but I consistently see uneven spacing between the bars. These look like they should be related to to the ticks, but they're in different positions

This shows up in larger plots bar chart 0 to 2000 with irregular spacing

And smaller ones bar chart 0 to 100 with irregular spacing


def TestPlotByFrequency (df, f_field, freq, description):
    import matplotlib.pyplot as plt

    fig, ax = plt.subplots()
    ax.bar(df[f_field][0:freq].index,\
           df[f_field][0:freq].values)


    plt.show()

This is not related to data either, none at the top have the same frequency count

    count
0   8266
1   6603
2   5829
3   4559
4   4295
5   4244
6   3889
7   3827
8   3769
9   3673
10  3606
11  3479
12  3086
13  2995
14  2945
15  2880
16  2847
17  2825
18  2719
19  2631
20  2620
21  2612
22  2590
23  2583
24  2569
25  2503
26  2430
27  2287
28  2280
29  2234
30  2138

Is there any way to make these consistent?

Upvotes: 9

Views: 2603

Answers (2)

Harvey Williams
Harvey Williams

Reputation: 19

This is an aliasing problem because you've got more data points than pixels to represent them. It's made particularly visible due to whitespace. The plot can be made to look nicer by setting an appropriate value for the "width" argument of bar() so that there is no whitespace between bars. Here I use width=1 (the default is 0.8). The code below takes around 40 seconds to run on my medium spec laptop.

import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(1,3)
ax1 = axes[0]
ax2 = axes[1]
ax3 = axes[2]

#Get some high-density data
raw=np.abs(np.random.normal(10000,1000,2000000).astype(np.int16))
counts=np.bincount(raw)
bins=np.arange(len(counts))


#Three different ways to plot the histogram
ax1.bar(bins, counts)
ax2.bar(bins, counts, width=1)
ax3.fill_between(bins, counts, step='mid') #Setting step='mid' centres the bars on the data


#Tidy up and plot
ax3.set_ylim(bottom=0)
for ax in axes: ax.set_xlim(5000, 15000)
ax1.set_title("plt.bar")
ax2.set_title("plt.bar, width=1")
ax3.set_title("plt.fill_between, step='mid'")
fig.tight_layout()
plt.savefig('norm.png')
plt.savefig("norm.svg",format='svg') #vector graphics means all data is represented in the image
plt.show()

demonstation of image aliasing in pyplot histograms

Other keyword arguments you might want to pass to plyplot.bar are linewidth=0 and edgecolor=None to ensure the no edges are drawn on the bars.

An alternative is to use a different plotting tool such as plt.fill_between or plt.step or plt.stairs. I chose to use fill_between as the other two options made filling the bars difficult.

Please note, even once you've made the whitespace aliasing disappear, you still have the issue that you've got more data points than pixels to represent them. You may want to investigate how this will be rendered - if the values are being averaged, this will smooth the data and hide fluctuations that you may be trying to catch.

Upvotes: 2

JohanC
JohanC

Reputation: 80509

The problem has to do with aliasing as the bars are too thin to really be separated. Depending on the subpixel value where a bar starts, the white space will be visible or not. The dpi of the plot can either be set for the displayed figure or when saving the image. However, if you have too many bars increasing the dpi will only help a little.

As suggested in this post, you can also save the image as svg to get a vector format. Depending where you want to use it, it can be perfectly rendered.

import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

matplotlib.rcParams['figure.dpi'] = 300

t = np.linspace(0.0, 2.0, 50)
s = 1 + np.sin(2 * np.pi * t)

df = pd.DataFrame({'time': t, 'voltage': s})

fig, ax = plt.subplots()
ax.bar(df['time'], df['voltage'], width = t[1]*.95)

plt.savefig("test.png", dpi=300)
plt.show()

Image with 100 dpi: 100 dpi image

Image with 300 dpi: 300 dpi image

Upvotes: 8

Related Questions