Reputation: 115
There is a dataset (just for test) as follow: 0.1 0.2 0.3 0.4 0.5 1.1 1.2 1.3 1.4 1.5 0.1 0.2 0.3 0.4 0.5 I'd like to get the frequency count between the minimum 0.1 and maximum 1.5 with the bin(step size) is 0.1. I have tested in Matlab, Octave, Origin, and AWK script. However, I got completely different result.
data = [0.1 0.2 0.3 0.4 0.5 1.1 1.2 1.3 1.4 1.5 0.1 0.2 0.3 0.4 0.5];
edge = 0.1:0.1:1.5;
count = histc(data, edge);
result is:
count = [2 4 0 2 2 0 0 0 0 0 1 1 1 1 1]
data = [0.1 0.2 0.3 0.4 0.5 1.1 1.2 1.3 1.4 1.5 0.1 0.2 0.3 0.4 0.5];
edge = 0.1:0.1:1.5;
count = histc(data, edge);
result is:
count = [2 2 2 2 2 0 0 0 0 0 1 2 0 1 1]
use the command "frequency count", set the min=0.1
, max=1.5
, step size=0.1.
result is:
count = [2 4 0 2 2 0 0 0 0 0 2 1 1 1]
{...;count[data/0.1]++;} ...
result is:
count = [2 4 0 2 2 0 0 0 0 0 2 0 2 0 1]
Why do I get these different results? Am I doing something wrong, or have I misunderstood the concept of "frequency count"? I don't think any of the above results are correct... Could you please tell me what should I do?
Upvotes: 3
Views: 487
Reputation: 1105
A quick way around would be to put the edge shifted
Matlab:
data = [0.1 0.2 0.3 0.4 0.5 1.1 1.2 1.3 1.4 1.5 0.1 0.2 0.3 0.4 0.5];
edge = 0.05:0.1:1.55;
count = histc(data, edge)
results:
Columns 1 through 9
2 2 2 2 2 0 0 0 0
Columns 10 through 16
0 1 1 1 1 1 0
note: there is a spurious peak at the end as length(edge) = length(data)+1 .
Then as Paul R suggested, it comes down to precision and rounding. You'll have to go into each frequency count function to see how it is interpreted by each language. If i were you, I would multiply everything by 10 and make them int.
data=int8(data.*10)
edge = 1:15;
count = histc(data, edge)
results:
Columns 1 through 9
2 2 2 2 2 0 0 0 0
Columns 10 through 15
0 1 1 1 1 1
What matters is how the human interpret it, not the machine. If you know you multiplied by 10 ^(your precision) and make them int, you don't care what the machine really does. Then if you have irrational numbers in your data and you still see errors, check how float numbers are coded (http://en.wikipedia.org/wiki/Floating_point)
Cheers.
Upvotes: 5