Ueffes
Ueffes

Reputation: 164

Multiplying column by -1 alters last digits of results using AWK

I have got pretty big data files and need to multiply my 2nd column by -1. The awk command looks like this:

awk '{printf "%.4f  %.16f\n", $1, $2*-1}' file > file_reverted

this works perfectly fine in some rows, but in some rows it somehow rounds down at the end instead of just adding a 0.

fine:

-0.094  0.0950083965247825  |  -0.0940  -0.0950083965247825
-0.0935 0.104569121568904   |  -0.0935  -0.1045691215689040
-0.093  0.114995049351066   |  -0.0930  -0.1149950493510660       

wrong:

-0.0795 1.08856685504934    |  -0.0795  -1.0885668550493399
-0.079  1.16919985559016    |  -0.0790  -1.1691998555901599

After 16 decimals this is not too much of a problem but it still falsifies my results a little.

Upvotes: 0

Views: 448

Answers (2)

First, read What every programmer should know about floating point arithmetic. Or, if you prefer, What very computer scientist should know about floating point arithmetic.

Whether you see (-)1.0885668550493399 or (-)1.08856685504934 has nothing to do with the operation that you're making. The numbers are the same except for the sign.

$ echo 1.08856685504934 | awk '{printf "%.16f %.16f\n", $1, -$1}'      
1.0885668550493399 -1.0885668550493399

What's happening is that you're printing the numbers with more precision than what is stored. The difference between the printed numbers is 10-16, which is less than 2-52 times the number, so is not expressible in the 52-bit mantissa of awk's floating point numbers. That's necessary to retain the precision across stages of printing out and parsing in. 1.0885668550493399 and 1.08856685504934 are representations for the same number.

Your results are not being falsified. You're getting the same result. Make sure to calculate the precision of your result — it's probably a lot less than the 52-bit precision of a floating-point value, as each stage of the calculation performs some rounding.

Upvotes: 3

Partha Lal
Partha Lal

Reputation: 541

If you don't need to change the number of decimal places compared to the input then doing this should work:

awk '{if($2<0){gsub(/-/,"",$2)}else if($2>0){$2="-"$2};print $1,$2}'

The floating point representation of the number isn't displayed, getting rid of the precision problem.

Upvotes: 1

Related Questions