Reputation: 89
I have two columns as follows.
ABC =
4.1103 25.5932
5.0852 31.2679
6.0021 15.9020
5.8495 21.4804
4.3245 19.9674
5.9378 38.3452
6.9460 8.8233
7.4568 44.7429
5.7358 32.7608
5.3510 35.2645
5.1657 54.6566
5.1381 44.1870
4.1566 101.8947
5.7310 -3.0565
5.5496 28.3637
4.5672 -1.7736
4.5805 11.8384
4.7948 33.7640
3.9901 6.0607
4.4203 17.7308
4.2712 -1.5834
4.8808 -2.3123
5.9004 -0.4623
5.3929 1.1477
5.6594 6.9741
5.5114 11.3982
5.4715 5.9189
5.0021 6.2561
4.1576 10.3207
6.1025 3.4654
3.9960 6.6892
5.6938 3.8429
5.2416 7.7513
7.0922 2.6871
5.3277 14.0617
6.1350 4.0316
6.0211 -20.3587
6.7399 14.0224
5.0818 102.6360
5.6444 24.3167
6.2542 19.8522
6.2862 24.3430
5.6452 -6.4020
5.4561 14.7813
4.7934 9.4639
3.8523 32.0766
3.9878 8.5313
4.5232 42.0309
4.2489 -12.0325
6.0413 -5.5464
4.9334 -3.2520
4.1349 20.9038
4.2329 20.6303
4.2009 31.8840
4.0624 48.5402
4.7674 28.6595
4.0767 4.7767
4.0971 34.8460
3.8442 24.0209
5.2471 38.8815
6.0241 59.3785
6.9743 6.5027
7.8732 4.5422
4.3094 68.4340
4.5601 -4.2946
4.6140 109.4510
4.5862 71.8387
5.2210 66.1310
4.3835 32.7592
6.1432 36.3832
5.4624 13.7891
5.2129 40.1301
3.8987 67.2705
6.6328 15.0286
8.0786 -7.3078
4.8968 -6.7754
4.1200 4.5333
4.1098 -3.3204
4.0373 26.4890
3.8467 48.8121
7.7795 -2.3606
6.9553 21.3609
6.2635 24.4985
6.1518 -1.4200
4.9115 11.5784
5.5908 13.1351
7.0117 -2.8297
5.2193 38.6937
6.0786 16.9453
6.8229 14.0907
8.0385 13.6228
8.6596 -1.4478
6.3257 8.0361
6.9223 -14.2179
3.8337 15.5773
4.0039 -24.1494
4.6332 17.9308
6.3684 11.3398
5.8592 4.0367
6.9040 12.1495
7.8524 -0.0432
8.3545 10.8865
9.3946 20.4614
4.3015 25.9674
4.4782 21.9045
4.1994 39.2286
4.3499 22.1004
4.3652 33.6220
4.2026 -5.8153
5.1330 6.4996
5.3118 33.7835
4.2002 -3.1917
3.8285 32.1016
3.9485 21.6358
3.8688 21.7830
4.0494 24.7914
4.0869 10.6577
4.6699 8.4756
5.1199 11.1885
5.1831 8.6163
4.5560 8.2806
4.4886 4.8017
4.5618 5.9434
4.1135 12.8942
4.1377 22.1423
I made equal no. of bins from 'x' and corresponding mean bin value 'yy'. as shown below
x=ABC(:,1);
y=ABC(:,2);
counter=1
for i=min(x):0.3:max(x)
bin= x>i & x<= i+0.3;
xbin(counter,1) = mean(x(bin));
yy(counter,1) = mean(y(bin));
counter = counter+1
end
plot(x,y,'ro'); hold on
plot(xbin,yy,'bo-');
Where a 'bin' is defined for certain range of 'x'(please see for loop).Now out put contains 'xbin' from 'x' and mean of data 'yy' from 'y' corresponding 'xbin'. I have concern about mean value 'yy' that it should be obtained from approx. equal no. of data point. If there are not sufficient data points of 'y' in 'bin' then the mean value 'yy' should be NaN. Please can someone help in this regard. Thanks
Upvotes: 0
Views: 854
Reputation: 5014
You are basically looking for a histogram with non-uniform bins or a histogram with equal counts.
The simplest case for a non-uniform histogram is to sort the N
values in x
and separate the sorted vector into k
bins, i.e. each bin will have N/k
of the samples (you can also set the ratio by specifying N = ck
).
Instead of a linear spacing the range domain x, you do a linear split of the ordered vector (thus a non-linear, non-uniform separation of the original range).
In your case it would look like this:
[sortedX, indeX] = sort(x);
nVals = length(x); % N
nBins = nVals/10; % k = N/c
% linear split of the sorted vector
stepX = (1:nVals/nBins:nVals);
if stepX(end)~=nVals, stepX = [stepX nVals+1]; end
% counting and bining on the indexed vector
for i = 1 : length(stepX)-1
bin = indeX(stepX(i):stepX(i+1)-1);
xbin(i,1) = mean(x(bin));
yy(i,1) = mean(y(bin));
end
To calculate the actual range (i.e. the edges of the histogram) you can use the midpoint between the max in bin i
and the min in bin i+1
. You can add something like the following in your loop:
% calculate the range
maxX(i) = max(x(bin));
minX(i) = min(x(bin));
The desired (non-linear) range is then:
rangeX = [min(x) maxX(1:end-1) + (minX(2:end) - maxX(1:end-1))/2 max(x)];
while your original (linear) range is:
rangeX_OP = min(x):0.3:max(x);
You can use histc
to verify the equal counts (for rangeX
) and non-equal counts (for rangeX_OP
). This is how the counts would look (for random x
in similar range to yours and c = 10
counts per bin). Top is the linear spacing if range, bottom is the non-linear.
Upvotes: 1
Reputation: 20205
The question isn't totally clear but have you tried using the histogram function, hist
? It seems that it can do a lot of the work for you
% choose the bin locations
xcenters = min(x):0.3:max(x);
% compute counts in each bin
[counts, ctrs] = hist(y, xcenters);
% set any with too few samples to NaN
count_min = 3;
counts(counts < count_min) = NaN;
% plot -- either as a histogram,
figure(1)
bar(ctrs, counts)
%or as a line plot (note that the line won't join up if too many NaN segments)
figure(2)
plot(ctrs, counts)
You are able to specify the input bin centres here, but to define the edges of the bins instead, look at histc
.
Upvotes: 1
Reputation: 7925
Check for the number of 1
s in bin
for each iteration of your for
-loop. If that number is below a certain threshold, assign NaN
to yy
:
x=ABC(:,1);
y=ABC(:,2);
counter=1;
nbinmin = 5; % this is the threshold
for i=min(x):0.3:max(x)
bin= x>i & x<= i+0.3;
xbin(counter,1) = mean(x(bin));
% check if the number of 1s in bin is less than the threshold
if length(bin(bin==1)) < nbinmin
yy(counter,1) = NaN;
else
yy(counter,1) = mean(y(bin));
end
counter = counter+1;
end
Upvotes: 1