Reputation: 2259
I am following an online course on Python. This is the code, verbatim. It conducts a Monte Carlo repetition of 100 random walks, 10 steps each.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(123)
final_tails = []
for x in range(100) :
tails = [0]
for x in range(10) :
coin = np.random.randint(0, 2)
tails.append(tails[x] + coin)
final_tails.append(tails[-1])
plt.hist(final_tails, bins = 10)
plt.show()
The course says that I should get the plot without gaps. I get exactly the same bar heights, in exactly the same order, but with some odd spacing gaps between them.
Can anyone corroborate this result or explain it?
I am using:
Thanks.
AFTERNOTE
I noticed that, unlike the course's abutted bars, my bin edges align with integers. This is not good, as the data will be integers, but whether the integers fall into the left or right side of the bin edges should be consistent. Hence, it doesn't seem to explain the gap. It could mean, however, that the auto-generation of bin edges changed somewhere in the evolution of matplotlib. I don't know what version the course uses.
P.S. The following indicates the problem is that the bin edges don't straddle all the integers in the data value range:
print( np.unique( np.array( final_tails ) ) )
print( np.unique( final_tails ) ) # data values
hist, bin_edges = np.histogram( final_tails )
print(bin_edges) # bin edges
print(hist) # bar heights
The data values are: [2 3 4 5 6 7 8 9]
The bin edges are: [2. 2.7 3.4 4.1 4.8 5.5 6.2 6.9 7.6 8.3 9. ]
The bar heights are: [ 2 10 23 0 21 27 0 10 6 1]
I got the course's nice abutted bars using:
plt.hist( final_tails ,
bins = np.arange( min( final_tails ) - 0.5 ,
max( final_tails ) + 1.5 , 1.0 ) ,
edgecolor="k" )
plt.show()
I have not posted this as the answer, as the credit goes to saibhaskar and ImportanceOfBeingErnest, who provided the details.
But I do wonder whether this need to customize the bin edges is might be because the scheme for automatic bin edges has changed between the creation of the course material and now.
Upvotes: 3
Views: 5864
Reputation: 467
You are getting the frequency as output for each Digit. So the reason for the blank is there is no occurrence of some digits like 1,2 and 9.
Your list (final_tails
) has the data
[3, 6, 4, 5, 4, 5, 3, 5, 4, 6, 6, 8, 6, 4, 7, 5, 7, 4, 3, 3, 4, 5, 8, 5, 6, 5, 7, 6, 4, 5, 8, 5, 8, 4, 6, 6, 3, 4, 5, 4, 7, 8, 9, 4, 3, 4, 5, 6, 4, 2, 6, 6, 5, 7, 5, 4, 5, 5, 6, 7, 6, 6, 6, 3, 6, 3, 6, 5, 6, 5, 6, 4, 6, 6, 3, 4, 4, 2, 4, 5, 4, 6, 6, 6, 8, 4, 6, 5, 7, 4, 6, 5, 4, 6, 7, 3, 7, 4, 5, 7]
Upvotes: 1
Reputation: 339480
The minimum and maximum of your data are 2, and 9, repsectively. Dividing this range by 10 bins, means each bin is 0.7 wide. We can compute the edges, which are 2, 2.7, 3.4, 4.1, 4.8, etc..
print(min(final_tails), max(final_tails))
# 2 9
step = (max(final_tails)-min(final_tails))/10
print(step)
# 0.7
edges = np.linspace(min(final_tails), max(final_tails), 10+1)
print(edges)
# [2.0 2.7 3.4 4.1 4.8 5.5 6.2 6.9 7.6 8.3 9.0 ]
Since your data is only integer numbers, e.g. in the bin between 4.1 and 4.8, there is no data, hence that bin's bar is missing in the plot.
I suspect that the image you show from the course has been produced by a different code than the one you show here.
Upvotes: 1