Phalgun
Phalgun

Reputation: 1494

Floating precision difference in Perl 5.16.3 and 5.8.7

Below piece of code when run with different versions of perl gives different output:

#!/usr/bin/env perl

my $number1 = 2.198696207;
my $number2 = 2.134326286;
my $diff = $number1 - $number2;
print STDOUT "\n 2.198696207 - 2.134326286: $diff\n";

$number1 = 0.449262271;
$number2 = 0.401361096;
$diff = $number1 - $number2;
print STDOUT "\n 2.198696207 - 2.134326286: $diff\n";

PERL 5.16.3:-

perl -v

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux
file `which perl`
/sv/app/perx/third-party/bin/perl: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped

 2.198696207 - 2.134326286: 0.0643699210000004
 2.198696207 - 2.134326286: 0.047901175

PERL 5.8.7:- perl -v

 This is perl, v5.8.7 built for i686-linux-thread-multi-64int
file `which perl`
/sv/app/perx/third-party/bin/perl: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), for GNU/Linux 2.2.5, not stripped
 2.198696207 - 2.134326286: 0.0643699209999999
 2.198696207 - 2.134326286: 0.047901175

I have not been able to find any documentation which speaks about the difference in precision/rounding of floating point numbers introduced between the above two versions.

Upvotes: 4

Views: 735

Answers (4)

Steffen Ullrich
Steffen Ullrich

Reputation: 123320

EDIT: thanks to Mark Dickinson for pointing out irregularities in my initial answer. The conclusion changed because of his detective work. Many thanks also to ikegami for his doubts on the initial analysis.

In summary: its because of small differences in the string to double conversation. And it looks like that these differences are caused by a different behavior of the same code when running on 32 bit and 64 bit.

Details

This is perl, v5.8.7 built for i686-linux-thread-multi-64int

This is Perl for 32 bit architecture

This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux

And this for 64 bit architecture.

This means these Perl versions are built against different CPU architectures and maybe different compile time options. This might result in a different precision for floating point operations. But it might also be related to string to double conversations as was pointed out in comments from ikegami.

For difference between the architectures see Problem with floating-point precision when moving from i386 to x86_64 or x87 FPU vs. SSE2 on Wikipedia.

I've done the following tests on the same computer with identical versions of Ubuntu (15.10) inside a LXC container, but one for 32 bit and the other for 64 bit.

# on 32 bit bit
$ perl -v
This is perl 5, version 20, subversion 2 (v5.20.2) built for i686-linux-gnu-thread-multi-64int
$ perl -V:nvsize
$ nvsize='8';
$ perl -E 'say 2.198696207-2.134326286'
0.0643699209999999

# on 64 bit
$ perl -v
This is perl 5, version 20, subversion 2 (v5.20.2) built for x86_64-linux-gnu-thread-multi
$ perl -V:nvsize
$ nvsize='8';
$ perl -E 'say 2.198696207-2.134326286'                                                                                                                                                                              
0.0643699210000004

This shows that the difference is not related to the Perl version or to the size of the floating point used. To get more details we have a look at the internal representation of the numbers using unpack('H*',pack('N',$double)). For 2.134326286 the representation is the same, i.e. 0xb7e7eaa819130140. But for 2.198696207 we get a different representation:

32 bit: 2.198696207 -> 0xe*5*3b7709ee960140 
64 bit: 2.198696207 -> 0xe*6*3b7709ee960140 

This means that the internal representation of the number is different on 64 bit and 32 bit. This can be due to different functions used because of optimizations for different platforms or because the same functions behaves slightly different on 32 bit and 64 bit. Checking with the libc function atof shows that this returns 0xe53b7709ee960140 on 64 bit too, so it looks like Perl is using a different function for the conversation.

Digging deeper shows that the Perl I have used on both platforms has USE_PERL_ATOF set which indicates that Perl is using its own implementation of the atof function. The source code for some current implementation of this function can be found here.

Looking at this code it is hard to see how it could behave differently for 32 and 64 bit. But there is one important platform dependent value which indicates how much data the atof implementation will accumulate inside an unsigned int before adding it to the internal representation of the floating point:

#define MAX_ACCUMULATE ( (UV) ((UV_MAX - 9)/10))

Obviously UV_MAX is different on 32 bit and 64 bit so it will cause different accumulation steps in 32 bit which causes different floating point additions with potential precision problems. My guess is that this somehow explains the tiny difference in the behavior between 32 bit and 64 bit.

Upvotes: 13

ikegami
ikegami

Reputation: 385789

There are some factors that can make a difference. In order of increasing likeliness in this particular case, they are the following:

  • The two builds might have different floating pointer number sizes.

    • If perl -V:nvsize gives 8, that build uses double-precision floating pointer numbers.

    • If perl -V:nvsize gives 16, that build uses quaduple-precision floating pointer numbers.

  • The C library is used to parse numbers and to format numbers. The two builds use different C libraries due to the differences in architectures. (They could also use different C libraries due to using different compiler vendors, different library versions installed, etc). Some libraries are better than others at doing these conversions (i.e. some are buggy).

  • The instruction set (x87 FPU vs SSE2) used can vary by architecture, and this matters because they perform the operations using different amount of significance internally. See Steffen Ullrich's answer for more details.

Upvotes: 4

brian d foy
brian d foy

Reputation: 132811

The documentation you want is perlnumber:

Perl can internally represent numbers in 3 different ways: as native integers, as native floating point numbers, and as decimal strings. Decimal strings may have an exponential notation part, as in "12.34e-56" . Native here means "a format supported by the C compiler which was used to build perl".

It's up to your C compiler and your compile-time options.

However, you don't have to use native numbers. If you can tolerate the performance hit, you can use bignum to get exact numbers.

Upvotes: 1

Borodin
Borodin

Reputation: 126722

The problem is certainly the floating-point architecture that each Perl installation is built to use. But do you really need those values to be identical? If so then you are bound for endless disappointment

A single-precision (32-bit) floating-point number will typically have an accuracy of seven decimal digits, so your program is displaying far beyond that limit

Unless you are trying to compare two floating-point values for equality (which is never possible, even within the same instruction set) the only problem you may have is that none of the values have sufficient accuracy for your purpose

0.0643699209999999 is equal to 0.0643699210000004 to an accuracy of seven digits, and that is all you can expect from any computer or language

Upvotes: 1

Related Questions