A.SDR
A.SDR

Reputation: 187

Does the precision of a float change from Python to C?

I am trying to scale and standardize some data (a subset of the original data):

4576, 3687, 3786, 4149
4497, 3707, 3782, 4126
4449, 3712, 3787, 4097
4499, 3679, 3800, 4093
4497, 3660, 3857, 4139
4463, 3691, 3851, 4116
4393, 3712, 3782, 4108
4364, 3765, 3785, 4099
4400, 3846, 3822, 4152
4645, 3905, 3804, 4253

These are the mean values of each column ...

4400.60009766,  3274.76000977,  3234.88989258,  3402.25000000

... and these are the scales:

2164.33007812,  2516.58349609,  2280.71508789,  2321.07519531

To do the task in Python I use the fit() function:

data_scaled = std_scale.transform(data)   # std_scale contains the mean and the scale values

In C I did the following:

void transform(uint16_t* in_data, unsigned size, double* mean, double* scale, double* out_data) {
   unsigned         i, j;

   for (i = 0; i < size; i++) {
      for (j = 0; j < 4; j++) {
          out_data[i][j] = ((double) in_data[i][j] - mean[j]) / scale[j];
      }
   }
}

But some of my results differ in the least-significant digits, which seems likely to result from a difference in floating-point precision:

python result

0.08104119   0.16380939   0.24163917   0.32172590
0.04454030   0.17175667   0.23988533   0.31181669
0.02236253   0.17374349   0.24207763   0.29932249   
0.04546437   0.16063046   0.24777760   0.29759914
0.04454030   0.15308055   0.27276975   0.31741756
0.02883105   0.16539884   0.27013901   0.30750835
-0.00351152   0.17374349   0.23988533   0.30406168   
-0.01691059   0.19480379   0.24120072   0.30018416
-0.00027727   0.22699028   0.25742370   0.32301840
0.11292173   0.25043476   0.24953143   0.36653271

C result

0.08104120  0.16380938  0.24163917  0.32172590
0.04454030  0.17175667  0.23988534  0.31181670     
0.02236253  0.17374349  0.24207763  0.29932249
0.04546437  0.16063047  0.24777760  0.29759915  
0.04454030  0.15308055  0.27276976  0.31741755
0.02883105  0.16539884  0.27013901  0.30750835
-0.00351152  0.17374349  0.23988534  0.30406167
-0.01691059  0.19480378  0.24120071  0.30018416
-0.00027727  0.22699028  0.25742370  0.32301840
0.11292173  0.25043476  0.24953143  0.36653272

The Python code is representing the data as float32. Am I missing something?

Upvotes: 2

Views: 434

Answers (1)

John Bollinger
John Bollinger

Reputation: 181008

If indeed the Python code is performing its calculations in 32-bit floating point, then it is very likely that you are right that there is a difference in precision here. The C language does not specify the details of floating-point representation, but these days it is rare to encounter an implementation in which double does not correspond to IEEE 754 binary double precision format -- a 64-bit format with 53-bit mantissa.

Note also, however, that the differences you are observing are in the 7th significant decimal digit. This is at the very limit of the precision of (32-bit) IEEE 754 binary single-precision format, and perhaps beyond. You only get 6-7 decimal digits. A difference at that precision could arise simply from differing order of operations, different timing of rounding, or some similar computational difference. If indeed you are performing at least one of your computations in that format, then there is no reason to expect the two computations to match to the precision you present.

Upvotes: 2

Related Questions