Reputation: 1629

Divide an unsigned long for a size_t and assign the result to a double

I have to divide an unsigned long int for a size_t (returned from a dimension of a array with size() ) like this:

vector<string> mapped_samples;
vector<double> mean;
vector<unsigned long> feature_sum;
/* elaboration here */
mean.at(index) = feature_sum.at(index) /mapped_samples.size();

but in this way an integer division takes place (I lose the decimal part. That's no good)

Therefore, I can do:

 mean.at(index) = feature_sum.at(index) / double(mapped_samples.size());

But in this way feature_sum.at(index) is automatically converted (Temporary copy) to double and I could lose precision. How can I tackle the question? I have to use some library?

It could be precision loss when you convert the unsigned long in double (because the unsigned long value could be larger than maximum double) The unsigned long value is the sum of the features (positives values). The samples of feature can be 1000000 or more and the sum of values of the features can be enourmus. The max value of a feature is 2000 thus: 2000*1000000 or more

(I'm using C++11)

Upvotes: 3

Answers (3)

Walter

Reputation: 45414

you cannot do better (if you want to store the result as a double), than the simple

std::uint64_t x=some_value, y=some_other_value;
auto mean = double(x)/double(y);

because the relative accuracy of the truncated form of the correct result using float128

auto improved = double(float128(x)/float128(x))

is typically the same (for typical input -- there may be rare inputs, where improvement is possible). Both have a relative error dictated by the length of the mantissa for double (53 bits). So the simple answer is: either use a more accurate type than double for your mean or forget about this issue.

To see the relative accuracy, let us assume that

x=a*(1+e);   // a=double(x)
y=b*(1+f);   // b=double(y)

where e, f are of the order 2^-53.

Then the 'correct' quotient is to first order in e and f

(x/y) = (a/b) * (1 + e - f)

Converting this to double incurs another relative error of the order of 2^-53, i.e. of the same order as the the error of (a/b), the result of the naive

mean = double(x)/double(y).

Of course, e and f can conspire to cancel, when more accuracy can be gained by the methods suggested in other answers, but typically the accuracy cannot be improved.

Upvotes: 2

R Sahu

Reputation: 206567

You could use:

// Grab the integral part of the division
auto v1 = feature_sum.at(index)/mapped_samples.size();

// Grab the remainder of the division
auto v2 = feature_sum.at(index)%mapped_samples.size();

// Dividing 1.0*v2 is unlikely to lose precision
mean.at(index) = v1 + static_cast<double>(v2)/mapped_samples.size();

Upvotes: 3

Severin Pappadeux

Reputation: 20080

You could try to use std::div

Along the lines

auto dv = std::div(feature_sum.at(index), mapped_samples.size());

double mean = dv.quot + dv.rem / double(mapped_samples.size());

Upvotes: 4

Divide an unsigned long for a size_t and assign the result to a double

Answers (3)

Related Questions