Mahi Rahman
Mahi Rahman

Reputation: 33

Mismatch in Histogram.

I am trying to plot the histogram from the attached datasets in excel files. I have 2 questions (Q.2 is more important). The related csv files can be accessed from this link: CSV files

1.Why the two histograms are different though exact same bins and bin sizes are used.

aa = xlsread('LF_NPV_Branch_Run.csv','C2:C828');
bb = xlsread('RES_Cob.csv','A1:CV827');
cc = aa*ones(1,100);
dev=bb-cc;
err_a=dev';

nbins = 20;
bound_n=min([floor(min(min(err_a))/10)*10,-10])
bound_p=max([ceil(max(max(err_a))/10)*10,10])
bins = linspace(bound_n,bound_p,nbins)

hist(err_a, bins)

figure(2)
hist(err_a(:), bins)

2.For figure 2, though the number for the tallest bin shows ~38000, but when I calculate the number using the bin on the center (zero) the number of points should be 63039 (which is more than the limit on the Y axis), not ~38000. What is the reason of this apparent mismatch?

val = dev(dev > bins(10) & dev < bins(11));
size(val)

Upvotes: 1

Views: 192

Answers (1)

Anthony
Anthony

Reputation: 3793

Normally, if you have multiple questions, you should ask them seperately, but I can see that these two questions are closely related.

If you read MATLAB's documentation for hist(x,xbins):

If xbins is a vector of evenly spaced values, then hist uses the values as the bin centers.

The bin edges for the bin centred at bin(10) are actually

lower=(bins(9)+bins(10))/2
upper=(bins(10)+bins(11))/2

Therefore, to answer your Q2, you should find the result of the following matches the bin size shown in figure:

val = dev(dev > lower & dev <= upper);
size(val)

If you want bins to be the edges, you should use histogram(err_a(:), bins). See Specify Bin Edges of Histogram.


Q1:

err_a is a 100x827 matrix; err_a(:) makes it a 82700x1 column vector.

hist(m, bins) returns a bin for every column in m for each bin centre specified in bins. In your case, err_a has 827 columns. For each bin centre, hist(err_a, bins) gives 827 results and that is why there is a cluster of columns for every bin centre. hist(err_a(:), bins) on the other hand only gives 1 result per bin centre.

Upvotes: 1

Related Questions