Mohit Rane
Mohit Rane

Reputation: 279

Regex pattern to get numbers from string

I have a string as below :

30750 [uber-SubtaskRunner] INFO  org.apache.hadoop.hive.ql.exec.Task  - Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1

Now I want to extract the numbers from it and add them up using shell script. Basically I want to get the sum of number of mappers and reducers. Splitting the string based on 'space character' does seem to be working for me, any regex pattern will do the stuff.

Thanks

Upvotes: 0

Views: 236

Answers (1)

Dmitry Egorov
Dmitry Egorov

Reputation: 9650

You can do it with a Perl one-liner:

perl -ne '$s+=$1 foreach /number of .*?: (\d+)/g; print $s'

Demo: https://ideone.com/8ghKE5


An awk version:

awk '{while(match($0,"number of [^:]+: ([[:digit:]]+)",a)){s+=a[1];$0=substr($0,RSTART+RLENGTH)}}END{print s}'

Demo: https://ideone.com/Hbccm9

Explanation:

  • The while() loop sums up all numbers into variable s extracted with the help of the regex in match().
    • In the loop condition:
      • The match() function tries to find the pattern number of [^:]+: ([[:digit:]]+) in the current input string ($0) and stores capture groups (subpatterns in parenthesis - ([[:digit:]]+) in our case) in the array a.
      • The regex number of [^:]+: ([[:digit:]]+) matches substring "number of <something not containing ':'>: <sequence of digits>" and captures the <sequence of digits> (which is effectively a number we're looking for) into the capture group one.
    • In the loop body:
      • s+=a[1] adds to s the number which was captured in the group one by the regex in match()
      • $0=substr($0,RSTART+RLENGTH) removes from the input string $0 everything up to (and including) substring matched the pattern in the match() so that this match() would lookup further on the next iteration.
  • The finalization block (END{...}) just prints the sum collected in s.

Upvotes: 1

Related Questions