Rainer Tenhunen
Rainer Tenhunen

Reputation: 293

Using awk to grab only numbers from a string

Background:
I have a column that should get user input in form of "Description text ref12345678". I have existing scripts that grab the reference number but unfortunately some users add it incorrectly so instead of "ref12345678" it can be "ref 12345678", "RF12345678", "abcd12345678" or any variation. Naturally the wrong formatting breaks some of the triggered scripts. For now I can't control the user input to this field, so I want to make the scripts later in the pipeline just to get the number.

At the moment I'm stripping the letters with awk '{gsub(/[[:alpha:]]/, "")}; 1', but substitution seems like an inefficient solution. (I know I can do this also with sed -n 's/.*[a-zA-Z]//p' and tr -d '[[:alpha:]]' but they are essentially the same and I want awk for additional programmability).

The question is, is there a way to set awk to either print only numbers from a string, or set delimits to numeric items in a string? (or is substitution really the most efficient solution for this problem).

So in summary: how do I use awk for $ echo "ref12345678" to print only "12345678" without substitution?

Upvotes: 26

Views: 112848

Answers (5)

Eugene W.
Eugene W.

Reputation: 446

grep works perfectly :

$ echo "../Tin=300_maxl=9_rdx=1.1" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'
300
9
1.1

Step by step explanation:

-E

Use extended regex.

-o

Return only the matches, not the context

[+-]?[0-9]+([.][0-9]+)?+

Match numbers which are identified as:

[+-]?

An optional leading sign

[0-9]+

One or more numbers

([.][0-9]+)?

An optional period followed by one or more numbers.

it is convenient to put the output in an array

arr=($(echo "../Tin=300_maxl=9_rdx=1.1" | grep -Eo '[+-]?[0-9]+([.][0-9]+)?'))

and then use it like this

Tin=${arr[0]}
maxl=${arr[1]}
etc..

Upvotes: 3

Jansen Simanullang
Jansen Simanullang

Reputation: 2175

In AWK you can specify multiple conditions like:


($3~/[[:digit:]+]/ && $3 !~/[[:alpha:]]/ && $3 !~/[[:punct:]]/ ) {print $3}

will display only digit without any alphabet and punctuation. with !~ means not contain any.

Upvotes: 5

Alex S
Alex S

Reputation: 169

You can also try the following with awk assuming there will be only one number in a string:

awk '{print ($0+0)}'

This converts your entire string to numeric, and the way that awk is implemented only the values that fit the numeric description will be left. Thus for example:

echo "19 trees"|awk '{print ($0+0)}'

will produce:
19

Upvotes: 16

iruvar
iruvar

Reputation: 23364

Another option (assuming GNU awk) involves specifying a non-numeric regular expression as a separator

awk -F '[^0-9]+' '{OFS=" "; for(i=1; i<=NF; ++i) if ($i != "") print($i)}'

Upvotes: 2

Kent
Kent

Reputation: 195049

if awk is not a must:

grep -o '[0-9]\+'

example:

kent$ echo "ref12345678"|grep -o '[0-9]\+'
12345678

with awk for your example:

kent$ echo "ref12345678"|awk -F'[^0-9]*' '$0=$2'     
12345678

Upvotes: 43

Related Questions