Reputation: 668
I am using a bar chart to plot query frequencies, but I consistently see uneven spacing between the bars. These look like they should be related to to the ticks, but they're in different positions
def TestPlotByFrequency (df, f_field, freq, description):
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df[f_field][0:freq].index,\
df[f_field][0:freq].values)
plt.show()
This is not related to data either, none at the top have the same frequency count
count
0 8266
1 6603
2 5829
3 4559
4 4295
5 4244
6 3889
7 3827
8 3769
9 3673
10 3606
11 3479
12 3086
13 2995
14 2945
15 2880
16 2847
17 2825
18 2719
19 2631
20 2620
21 2612
22 2590
23 2583
24 2569
25 2503
26 2430
27 2287
28 2280
29 2234
30 2138
Is there any way to make these consistent?
Upvotes: 9
Views: 2603
Reputation: 19
This is an aliasing problem because you've got more data points than pixels to represent them. It's made particularly visible due to whitespace. The plot can be made to look nicer by setting an appropriate value for the "width" argument of bar() so that there is no whitespace between bars. Here I use width=1
(the default is 0.8). The code below takes around 40 seconds to run on my medium spec laptop.
import matplotlib.pyplot as plt
import numpy as np
fig, axes = plt.subplots(1,3)
ax1 = axes[0]
ax2 = axes[1]
ax3 = axes[2]
#Get some high-density data
raw=np.abs(np.random.normal(10000,1000,2000000).astype(np.int16))
counts=np.bincount(raw)
bins=np.arange(len(counts))
#Three different ways to plot the histogram
ax1.bar(bins, counts)
ax2.bar(bins, counts, width=1)
ax3.fill_between(bins, counts, step='mid') #Setting step='mid' centres the bars on the data
#Tidy up and plot
ax3.set_ylim(bottom=0)
for ax in axes: ax.set_xlim(5000, 15000)
ax1.set_title("plt.bar")
ax2.set_title("plt.bar, width=1")
ax3.set_title("plt.fill_between, step='mid'")
fig.tight_layout()
plt.savefig('norm.png')
plt.savefig("norm.svg",format='svg') #vector graphics means all data is represented in the image
plt.show()
Other keyword arguments you might want to pass to plyplot.bar are linewidth=0
and edgecolor=None
to ensure the no edges are drawn on the bars.
An alternative is to use a different plotting tool such as plt.fill_between or plt.step or plt.stairs. I chose to use fill_between as the other two options made filling the bars difficult.
Please note, even once you've made the whitespace aliasing disappear, you still have the issue that you've got more data points than pixels to represent them. You may want to investigate how this will be rendered - if the values are being averaged, this will smooth the data and hide fluctuations that you may be trying to catch.
Upvotes: 2
Reputation: 80509
The problem has to do with aliasing as the bars are too thin to really be separated. Depending on the subpixel value where a bar starts, the white space will be visible or not. The dpi of the plot can either be set for the displayed figure or when saving the image. However, if you have too many bars increasing the dpi will only help a little.
As suggested in this post, you can also save the image as svg to get a vector format. Depending where you want to use it, it can be perfectly rendered.
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
matplotlib.rcParams['figure.dpi'] = 300
t = np.linspace(0.0, 2.0, 50)
s = 1 + np.sin(2 * np.pi * t)
df = pd.DataFrame({'time': t, 'voltage': s})
fig, ax = plt.subplots()
ax.bar(df['time'], df['voltage'], width = t[1]*.95)
plt.savefig("test.png", dpi=300)
plt.show()
Upvotes: 8