Reputation: 1629
I have to divide an unsigned long int for a size_t (returned from a dimension of a array with size() ) like this:
vector<string> mapped_samples;
vector<double> mean;
vector<unsigned long> feature_sum;
/* elaboration here */
mean.at(index) = feature_sum.at(index) /mapped_samples.size();
but in this way an integer division takes place (I lose the decimal part. That's no good)
Therefore, I can do:
mean.at(index) = feature_sum.at(index) / double(mapped_samples.size());
But in this way feature_sum.at(index)
is automatically converted (Temporary copy) to double
and I could lose precision. How can I tackle the question? I have to use some library?
It could be precision loss when you convert the unsigned long in double (because the unsigned long value could be larger than maximum double) The unsigned long value is the sum of the features (positives values). The samples of feature can be 1000000 or more and the sum of values of the features can be enourmus. The max value of a feature is 2000 thus: 2000*1000000 or more
(I'm using C++11)
Upvotes: 3
Views: 2997
Reputation: 45414
you cannot do better (if you want to store the result as a double
), than the simple
std::uint64_t x=some_value, y=some_other_value;
auto mean = double(x)/double(y);
because the relative accuracy of the truncated form of the correct result using float128
auto improved = double(float128(x)/float128(x))
is typically the same (for typical input -- there may be rare inputs, where improvement is possible). Both have a relative error dictated by the length of the mantissa for double
(53 bits). So the simple answer is: either use a more accurate type than double
for your mean or forget about this issue.
To see the relative accuracy, let us assume that
x=a*(1+e); // a=double(x)
y=b*(1+f); // b=double(y)
where e
, f
are of the order 2^-53.
Then the 'correct' quotient is to first order in e
and f
(x/y) = (a/b) * (1 + e - f)
Converting this to double
incurs another relative error of the order of 2^-53, i.e. of the same order as the the error of (a/b)
, the result of the naive
mean = double(x)/double(y).
Of course, e
and f
can conspire to cancel, when more accuracy can be gained by the methods suggested in other answers, but typically the accuracy cannot be improved.
Upvotes: 2
Reputation: 206567
You could use:
// Grab the integral part of the division
auto v1 = feature_sum.at(index)/mapped_samples.size();
// Grab the remainder of the division
auto v2 = feature_sum.at(index)%mapped_samples.size();
// Dividing 1.0*v2 is unlikely to lose precision
mean.at(index) = v1 + static_cast<double>(v2)/mapped_samples.size();
Upvotes: 3
Reputation: 20080
You could try to use std::div
Along the lines
auto dv = std::div(feature_sum.at(index), mapped_samples.size());
double mean = dv.quot + dv.rem / double(mapped_samples.size());
Upvotes: 4