Reputation: 971
I have a sample data file (sample.log) that has entries
0.0262
0.0262
0.7634
5.7262
0.abc02
I need to filter out the lines that contain non-numeric data, in the above lines, the last entry.
I tried this
sed 's/[^0-9]//g' sample.log
It removes the non-numeric line but also removes the decimal values, the output I get is
00262
00262
07634
57262
How can I get the original values retained after eliminating the non-numeric lines. Can I use tr or awk
Upvotes: 2
Views: 4632
Reputation: 2091
You can do it easily with grep if you discard any line that contains any letter:
grep -v [a-z] test
Upvotes: 1
Reputation: 103744
In awk:
awk '/^[[:digit:].]+$/{print $0}' file
Or, you negate that (and add potential +
or -
if that is in your strings):
awk '/[^[:digit:].+-]/{next} 1' file
Or, same logic with sed
:
sed '/[^[:digit:].+-]/d' file
Ed Morton's solution is robust. Given:
$ cat nums.txt
1e6
.1e6
1E6
.001
.
0.001
.1.2
1abc2
0.0
-0
-0.0
0x123
0223
011
NaN
inf
abc
$ awk '$0==($0+0) {printf "%s => %f\n", $0, ($0+0)}
$0!=($0+0) {notf[$0]++;}
END {for (e in notf) print "\""e"\""" not a float"}' /tmp/nums.txt
1e6 => 1000000.000000
.1e6 => 100000.000000
1E6 => 1000000.000000
.001 => 0.001000
0.001 => 0.001000
0.0 => 0.000000
-0 => 0.000000
-0.0 => 0.000000
0x123 => 291.000000
0223 => 223.000000
011 => 11.000000
NaN => nan
inf => inf
".1.2" not a float
"1abc2" not a float
"abc" not a float
"." not a float
Upvotes: 1
Reputation: 203209
You can't do this job robustly with sed or grep or any other tool that doesn't understand numbers, you need awk instead:
$ cat file
1e3
1f3
0.1.2.3
0.123
$ awk '$0==($0+0)' file
1e3
0.123
The best you could do with a sed solution would be:
$ sed '/[^0-9.]/d; /\..*\./d' file
0.123
which removes all lines that contains anything other than a digit or period then all those that contain 2 or more periods (e.g. an IP address) but that still can't recognize the exponent notation as a number.
If you have hex input data and GNU awk (see @dawg's comment below):
$ echo "0x123" | awk --non-decimal-data '$0==($0+0){printf "%s => %f\n", $0, ($0+0)}'
0x123 => 291.000000
Upvotes: 4
Reputation: 58371
This might work for you (GNU sed):
sed '/[^0-9.]/d' file
However this may give a false positive on say an IP address i.e. allowing more than one .
.
Using your test data:
sed '/^[0-9]\.[0-9]\{4\}$/!d' file
Would only match a digit, followed by a .
followed by 4 digits.
Upvotes: 0