Reputation: 293
Background:
I have a column that should get user input in form of "Description text ref12345678". I have existing scripts that grab the reference number but unfortunately some users add it incorrectly so instead of "ref12345678"
it can be "ref 12345678"
, "RF12345678"
, "abcd12345678"
or any variation. Naturally the wrong formatting breaks some of the triggered scripts.
For now I can't control the user input to this field, so I want to make the scripts later in the pipeline just to get the number.
At the moment I'm stripping the letters with awk '{gsub(/[[:alpha:]]/, "")}; 1'
, but substitution seems like an inefficient solution. (I know I can do this also with sed -n 's/.*[a-zA-Z]//p'
and tr -d '[[:alpha:]]'
but they are essentially the same and I want awk for additional programmability).
The question is, is there a way to set awk to either print only numbers from a string, or set delimits to numeric items in a string? (or is substitution really the most efficient solution for this problem).
So in summary: how do I use awk for $ echo "ref12345678"
to print only "12345678" without substitution?
Upvotes: 26
Views: 112848
Reputation: 446
grep works perfectly :
$ echo "../Tin=300_maxl=9_rdx=1.1" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'
300
9
1.1
Step by step explanation:
-E
Use extended regex.
-o
Return only the matches, not the context
[+-]?[0-9]+([.][0-9]+)?+
Match numbers which are identified as:
[+-]?
An optional leading sign
[0-9]+
One or more numbers
([.][0-9]+)?
An optional period followed by one or more numbers.
it is convenient to put the output in an array
arr=($(echo "../Tin=300_maxl=9_rdx=1.1" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'))
and then use it like this
Tin=${arr[0]}
maxl=${arr[1]}
etc..
Upvotes: 3
Reputation: 2175
In AWK you can specify multiple conditions like:
($3~/[[:digit:]+]/ && $3 !~/[[:alpha:]]/ && $3 !~/[[:punct:]]/ ) {print $3}
will display only digit without any alphabet and punctuation. with !~ means not contain any.
Upvotes: 5
Reputation: 169
You can also try the following with awk assuming there will be only one number in a string:
awk '{print ($0+0)}'
This converts your entire string to numeric, and the way that awk is implemented only the values that fit the numeric description will be left. Thus for example:
echo "19 trees"|awk '{print ($0+0)}'
will produce:
19
Upvotes: 16
Reputation: 23364
Another option (assuming GNU awk
) involves specifying a non-numeric regular expression as a separator
awk -F '[^0-9]+' '{OFS=" "; for(i=1; i<=NF; ++i) if ($i != "") print($i)}'
Upvotes: 2
Reputation: 195049
if awk is not a must:
grep -o '[0-9]\+'
example:
kent$ echo "ref12345678"|grep -o '[0-9]\+'
12345678
with awk for your example:
kent$ echo "ref12345678"|awk -F'[^0-9]*' '$0=$2'
12345678
Upvotes: 43