D Prat
D Prat

Reputation: 362

Greater than float with awk

I found some questions about this, but none of them really answered to my question.

I have a tabulated file like this:

2   10610   0   0   0   0.0105292
2   10649   0   0   0   0.041959
2   10682   0   0   0   0.0449746
2   10705   0   0   0   0.0441639
2   10797   2   0   0   0.0342728
2   10955   0   0   0   0.0136986
2   10957   0   0   0   0.0135135
2   11124   0   0   0   0.0583367
2   11336   1   0   0   0.0219502

and I used this command:

awk '{if ($6 > 0.4) print $6}' myfile

And here is the output:

0.0105292
0.041959
0.0449746
0.0441639
0.0342728
0.0136986
0.0135135
0.0583367
0.0219502

It's returning all the value for the 6th column. Here i should get no results since the condition is not respected. So I guess awk is not considering $6 as a float.

I tried other syntax but I still have the same problem.

I also tried the command on the first column and there it's working...

ps: I'm on MacOSX

Edit: Though it's working when I use awk '{print $6}'

Upvotes: 4

Views: 4145

Answers (1)

Ed Morton
Ed Morton

Reputation: 203995

It's your locale setting (see https://www.gnu.org/software/gawk/manual/gawk.html#Locales and specifically https://www.gnu.org/software/gawk/manual/gawk.html#Locale-influences-conversions), explicitly setting LC_ALL=C is one way to solve the problem:

LC_ALL=C awk '{if ($6 > 0.4) print $6}' myfile

What's happening is that you're trying to use a decimal point of . but your locale (typical in most European countries and many others) uses , instead. So when your input contains:

0.0105292

awk does not recognize it as looking like a number in your locale, so instead it gets treated as a string. If your input was instead:

0,0105292

THEN awk would recognize it as a number (so this is the other way to solve your problem - use commas as the decimal point in your input).

So to awk your code:

$6 > 0.4

is a string "0.0105292" being compared to a number 0.4 (per POSIX the . is always the decimal point when used in the code) and per this comparison table from the gawk manual:

        +----------------------------------------------
        |       STRING          NUMERIC         STRNUM
--------+----------------------------------------------
        |
STRING  |       string          string          string
        |
NUMERIC |       string          numeric         numeric
        |
STRNUM  |       string          numeric         numeric
--------+----------------------------------------------

we see that the type of comparison performed when a string is compared to a number (or anything else) is a string comparison.

So in your original code the string "0.0105292" is being string-compared with the number 0.4 and awk is apparently deciding that the former is greater than the latter (idk why, maybe some other locale effect).

Upvotes: 11

Related Questions