Reputation: 15010

strtonum in awk causes the values to lose precision

I have a text file in the following format.

1   0x5212cb03ca115ac0  0x3665fb5f1ac1
2   0x5212cb03ca115cc0  0x3665fb5f1ac7
3   0x5212cb03ca115ea0  0x3665fb5f1acd
4   0x5212cb03ca1160c0  0x3665fb5f1ad3
5   0x5212cb03ca1162a0  0x3665fb5f1ad9
6   0x5212cb03ca1164c0  0x3665fb5f1ade
7   0x5212cb03ca1166a0  0x3665fb5f1ae4
8   0x5212cb03ca1168a0  0x3665fb5f1aea
9   0x5212cb03ca116aa0  0x3665fb5f1af0
10  0x5212cb03ca116ca0  0x3665fb5f1af6

Command:

awk  '{print $1 "  "strtonum($2)-0x5212cb03ca115ac0 "  "strtonum($3)-0x3665fb5f1ac1 }' output.txt

The output that I get is given below.

1   0     0
2   1024  6
3   2048  12
4   2048  18
5   2048  24
6   3072  29
7   4096  35
8   4096  41
9   4096  47
10  5120  53

If you see the values in column 2 has some values repeating themselves.(2048 and 4096).This is caused due to loss of precision when using strtonum

Can someone suggest some method to achieve the same but avoid this loss of precision.

Upvotes: 3

Answers (3)

rici

Reputation: 241861

It's worth noting that as of version 4.1.0, gawk supports bignums if you provide the --bignum command-line flag (and if gawk was compiled with bignum support). Unfortunately, debian/ubuntu packagers haven't caught up with the new version yet (it was released in May).

Here's what I did to install gawk-4.1.0 on a reasonably stock ubuntu system:

# Download the source.
$ curl http://ftp.gnu.org/gnu/gawk/gawk-4.1.0.tar.gz > gawk-4.1.0.tar.gz
# Get the needed header files
$ sudo apt-get install libgmp-dev libmpfr-dev
# Unpack the gawk distribution
$ tar xf gawk-4.1.0.tar.gz
# Configure and compile it
$ ./configure
$ make
# Install it (as /usr/local/bin/gawk)
$ sudo make install

# Try it out
$ gawk --bignum '{printf "%2d %8d %8d\n",
                 $1, strtonum($2)-0x5212cb03ca115ac0,
                 strtonum($3)-0x3665fb5f1ac1 }' test.dat 
 1        0        0
 2      512        6
 3     1344       12
 4     1536       18
 5     2016       24
 6     2560       29
 7     3040       35
 8     3552       41
 9     4064       47
10     4576       53

(Actually, that's a little misleading. I already had gawk 4.1 installed, but I pretended that I was doing it fresh. Also, now that I think of it, I'd used the .xz file, not the .gz file, but I'm sure both of them decompress to the same thing. The .xz version is half the size.)

Upvotes: 5

Red Cricket

Reputation: 10470

You could use bc. This may not be exactly what you need but I am sure you can tweek to get the desired result ...

$ cat bc_output.txt
obase=10
ibase=16
5212CB03CA115AC0 - 5212CB03CA115AC0 ; 3665FB5F1AC1 - 3665FB5F1AC1
5212CB03CA115CC0 - 5212CB03CA115AC0 ; 3665FB5F1AC7 - 3665FB5F1AC1
5212CB03CA115EA0 - 5212CB03CA115AC0 ; 3665FB5F1ACD - 3665FB5F1AC1
5212CB03CA1160C0 - 5212CB03CA115AC0 ; 3665FB5F1AD3 - 3665FB5F1AC1
5212CB03CA1162A0 - 5212CB03CA115AC0 ; 3665FB5F1AD9 - 3665FB5F1AC1
5212CB03CA1164C0 - 5212CB03CA115AC0 ; 3665FB5F1ADE - 3665FB5F1AC1
5212CB03CA1166A0 - 5212CB03CA115AC0 ; 3665FB5F1AE4 - 3665FB5F1AC1
5212CB03CA1168A0 - 5212CB03CA115AC0 ; 3665FB5F1AEA - 3665FB5F1AC1
5212CB03CA116AA0 - 5212CB03CA115AC0 ; 3665FB5F1AF0 - 3665FB5F1AC1
5212CB03CA116CA0 - 5212CB03CA115AC0 ; 3665FB5F1AF6 - 3665FB5F1AC1
quit

$ bc -l bc_output.txt
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
0
0
512
6
992
12
1536
18
2016
24
2560
29
3040
35
3552
41
4064
47
4576
53

Upvotes: 1

perreal

Reputation: 98068

Perhaps this can get the job done:

awk  '{print $1 "  "strtonum("0x"substr($2,11))-0xca115ac0 "  "strtonum($3)-0x3665fb5f1ac1 }' input

And the Perl version:

perl -lane '{print join(" ", $F[0], hex($F[1])-0x5212cb03ca115ac0, hex($F[2]) - 0x3665fb5f1ac1)}' input

Upvotes: 2

strtonum in awk causes the values to lose precision

Answers (3)

Related Questions