Narabhut
Narabhut

Reputation: 837

Extracting numbers from a string and calculating percentage in Bash

I'm running a command line tool which returns results like this -

data {   
  metric: 0   
  metric: 1234.5
  metric: 230499
  metric: 234234
} 
data {   
  metric: 0   
  metric: 6789  
  metric: 23526   
  metric: 234634767 
}

I'd like to basically calculate (1234.5/6789).....the fraction between the 2nd lines in the 2 results. These numbers can be decimal numbers. The request will always be in that order. Is it possible through grep/sed?

Upvotes: 0

Views: 1434

Answers (5)

dawg
dawg

Reputation: 104024

Here is a Perl solution.

Given:

$ echo "$tgt"
data {   
  metric: 0   
  metric: 1234.5
  metric: 230499
  metric: 234234
} 
data {   
  metric: 0   
  metric: 6789  
  metric: 23526   
  metric: 234634767 
}

You can use a regex in perl's 'slurp' mode to find the pairs you wish:

$ echo "$tgt" | perl -0777 -lne '
@a=/^data\s+\{\s+(?:metric:[\s\d.]+){1}metric:\s+(\d+(?:\.\d+)?)/gm;
print $a[0]/$a[1]
'
0.181838267786125

The value inside of the braces in (?:metric:[\s\d.]+){1}, 1 in this case, will select which pair; 1234.5 and 6789 in this case.

Upvotes: 0

ivan_pozdeev
ivan_pozdeev

Reputation: 36036

grep/sed cannot perform arithmetic evaluation nor they have the ability to set state variables - so, no, this isn't. Basically, they aren't designed for anything beyond search and replace. This can be achieved with stunts coupling them with head/bc/etc but this is highly inconvenient and fragile.

This is possible with awk (the code is tailored to be production-grade so it validates the input and adheres to the DRY principle):

function error(m){print m " at line " FNR ":`" $0 "'">"/dev/stderr";_error=1;exit 1;}
BEGIN{brace=0; #brace level
index_=0; #record index
v1="+NaN";v2=v1; #values; if either is not reassigned, the result will be NaN
first_section=0; #1st section ended
second_section=0; #2nd section ended
record_pattern="[[:space:]]*metric:[[:space:]]*([[:digit:]]+(\\.[[:digit:]]+)?)[[:space:]]*$";
}
END{if(_error)exit;
if (brace>0){error("invalid:unclosed section");}
if(!second_section){error("invalid:less than 2 sections present")}}
#section start
/^data[[:space:]]+\{[[:space:]]*$/{if(brace>0){error("invalid:nested brace");}brace+=1;next;}
#section end
/^\}[[:space:]]*$/{brace-=1;if(brace<0){error("invalid:unmatched brace")}index_=0;
if(!first_section){first_section=1;next;}
if(!second_section){second_section=1;}
next;}
#record
$0~record_pattern{
match($0,record_pattern,m); #awk cannot capture groups from the line pattern
if(brace==0)error("invalid:record outside a section");
if(index_==1){
  if(!first_section){v1=m[1];}
  else if(!second_section){v2=m[1];}}
 index_++;next;
}
#anything else
{error("invalid:unrecognized syntax");}
#in the very end and if there were no errors
END{print v1/v2;}

Though equivalent programs in perl and python would be much more readable (and thus, maintainable).

Upvotes: 0

chw21
chw21

Reputation: 8140

Here's a solution using awk:

#!/usr/bin/awk -f
BEGIN {
        FS=" *\n? *[a-zA-Z]*: *"
        RS="} *\n"
    }
NR<=2 { a[NR] = $3 }
END { print (a[1]/a[2]) }

You can use that file with the command:

$ awk -f <awk-file> <data-file>

Or you can make it executable and call it directly.

awk separates the input data into records which in turn are separated into fields. In the beginning, I carefully craft the record and field separators, so that the interesting metric is in the 3rd field of a record. (The first field is data {)

Then for the first and second record, I store the 3rd fields in an array a.

At the end, I print the ratio between the first and second elements of the array.

Update: I managed to get it down to 3 lines:

BEGIN { RS="} *\n" }
NR<=2 { a[NR] = $6 }
END { print (a[1]/a[2]) }

Without setting the field separator, it remains at default. So $1 is data, $2 is {, $3 is the first metric:, $4 is the first number, $5 is the second metric: and $6 is the number we want.

Upvotes: 0

styko
styko

Reputation: 701

It looks like one of your requirements is to use bash commands (grep, sed, etc.) only. But you have to be aware that you will need something else to do your decimal division. The simplest choice is bc.

Here is my suggestion using grep, sed, cut and bc. I did not try to compactify it. In theory, you should be able to use only one big sed command!

./yourProgram | grep metric | sed -n 2~4p | sed -r 's/^\s+//' | cut -f2 -d' ' | sed 'N;s_\n_ / _' | bc -l

Let's go through it slowly:

  • grep metric selects the lines containing "metric"
  • sed -n 2~4p selects one line out of four, starting from the second line
  • sed -r 's/^\s+//' suppresses the blank characters at the beginning of the lines. -r is the enhanced regex option (to use \s and +), it is not mandatory but make it look nicer. With MacOS, you should use -E
  • cut -f2 -d' ' selects the 2nd field of each lines (the delimiter being a space)
  • sed 'N;s_\n_ / _' replaces the newline by " / ". Note that we use "_" instead of "/" to be able not to match "/"
  • bc -l does the operation

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 247012

Here's an obscure answer: Tcl. The syntax of that output is similar to Tcl syntax, so we can define a procedure named data and a procedure named metric: and execute that output like a Tcl script. You'd run it like this:

tclsh pct.tcl <(the process that produces the output)

And the "pct.tcl" script is:

#!/usr/bin/env tcl

set n 0
set values [dict create]

proc data {block} {
    uplevel 1 $block
    incr ::n
}

proc metric: {value} {
    dict lappend ::values $::n $value
}

source [lindex $argv 0]

foreach num [dict get $values 0] denom [dict get $values 1] {
    if {$denom == 0} {
        puts "$num / $denom = Inf"
    } else {
        puts [format "%s / %s = %.2f" $num $denom [expr {double($num) / $denom}]]
    }
}

output:

0 / 0 = Inf
1234.5 / 6789 = 0.18
230499 / 23526 = 9.80
234234 / 234634767 = 0.00

Upvotes: 1

Related Questions