Reputation: 971

Remove lines containing non-numeric entries in bash

I have a sample data file (sample.log) that has entries

0.0262
0.0262
0.7634
5.7262
0.abc02

I need to filter out the lines that contain non-numeric data, in the above lines, the last entry.

I tried this

sed 's/[^0-9]//g' sample.log

It removes the non-numeric line but also removes the decimal values, the output I get is

How can I get the original values retained after eliminating the non-numeric lines. Can I use tr or awk

Upvotes: 2

Answers (5)

Zumo de Vidrio

Reputation: 2091

You can do it easily with grep if you discard any line that contains any letter:

grep -v [a-z] test

Upvotes: 1

dawg

Reputation: 103744

In awk:

awk '/^[[:digit:].]+$/{print $0}' file

Or, you negate that (and add potential + or - if that is in your strings):

awk '/[^[:digit:].+-]/{next} 1' file

Or, same logic with sed:

sed  '/[^[:digit:].+-]/d' file

Ed Morton's solution is robust. Given:

$ cat nums.txt
    1e6          
.1e6
1E6
.001
.
0.001
.1.2
1abc2
0.0
-0
-0.0
0x123
0223
011
NaN
inf
abc

$ awk '$0==($0+0) {printf "%s => %f\n", $0, ($0+0)}
       $0!=($0+0) {notf[$0]++;}
       END {for (e in notf) print "\""e"\""" not a float"}' /tmp/nums.txt
        1e6           => 1000000.000000
.1e6 => 100000.000000
1E6 => 1000000.000000
.001 => 0.001000
0.001 => 0.001000
0.0 => 0.000000
-0 => 0.000000
-0.0 => 0.000000
0x123 => 291.000000
0223 => 223.000000
011 => 11.000000
NaN => nan
inf => inf
".1.2" not a float
"1abc2" not a float
"abc" not a float
"." not a float

Upvotes: 1

Ed Morton

Reputation: 203209

You can't do this job robustly with sed or grep or any other tool that doesn't understand numbers, you need awk instead:

$ cat file
1e3
1f3
0.1.2.3
0.123

$ awk '$0==($0+0)' file
1e3
0.123

The best you could do with a sed solution would be:

$ sed '/[^0-9.]/d; /\..*\./d' file
0.123

which removes all lines that contains anything other than a digit or period then all those that contain 2 or more periods (e.g. an IP address) but that still can't recognize the exponent notation as a number.

If you have hex input data and GNU awk (see @dawg's comment below):

$ echo "0x123" | awk --non-decimal-data '$0==($0+0){printf "%s => %f\n", $0, ($0+0)}'
0x123 => 291.000000

Upvotes: 4

potong

Reputation: 58371

This might work for you (GNU sed):

sed '/[^0-9.]/d' file

However this may give a false positive on say an IP address i.e. allowing more than one ..

Using your test data:

sed '/^[0-9]\.[0-9]\{4\}$/!d' file

Would only match a digit, followed by a . followed by 4 digits.

Upvotes: 0

YtRen

Reputation: 13

Use:

$ sed -i '/.*[a-z].*/d' sample.log

Upvotes: 0

Remove lines containing non-numeric entries in bash

Answers (5)

Related Questions