Reputation: 279
I have a string as below :
30750 [uber-SubtaskRunner] INFO org.apache.hadoop.hive.ql.exec.Task - Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
Now I want to extract the numbers from it and add them up using shell script. Basically I want to get the sum of number of mappers and reducers. Splitting the string based on 'space character' does seem to be working for me, any regex pattern will do the stuff.
Thanks
Upvotes: 0
Views: 236
Reputation: 9650
You can do it with a Perl one-liner:
perl -ne '$s+=$1 foreach /number of .*?: (\d+)/g; print $s'
Demo: https://ideone.com/8ghKE5
An awk version:
awk '{while(match($0,"number of [^:]+: ([[:digit:]]+)",a)){s+=a[1];$0=substr($0,RSTART+RLENGTH)}}END{print s}'
Demo: https://ideone.com/Hbccm9
Explanation:
while()
loop sums up all numbers into variable s
extracted with the help of the regex in match()
.
match()
function tries to find the pattern number of [^:]+: ([[:digit:]]+)
in the current input string ($0
) and stores capture groups (subpatterns in parenthesis - ([[:digit:]]+)
in our case) in the array a
.number of [^:]+: ([[:digit:]]+)
matches substring "number of <something not containing ':'>: <sequence of digits>"
and captures the <sequence of digits>
(which is effectively a number we're looking for) into the capture group one.s+=a[1]
adds to s
the number which was captured in the group one by the regex in match()
$0=substr($0,RSTART+RLENGTH)
removes from the input string $0
everything up to (and including) substring matched the pattern in the match()
so that this match()
would lookup further on the next iteration.END{...}
) just prints the sum collected in s
.Upvotes: 1