laurent01
laurent01

Reputation: 125

gawk match function parameter as regular expression

I encounter error when I use gawk. Below is the my script and example file Can you guys help me? I think regex is right but there is an error when it passed to the match function. I try various approach such as give \ to special character of regex or double .

$ cat script.sh

#!/bin/bash
gawk '
BEGINFILE{
        while( getline < FILENAME > 0 ){
                print match($0, /[0-9]+\.[0-9]+(?= ops/s)/)
                print $0
        }
}
' ./file

$ cat file

123.456: IO Summary: 123456 ops 1234.567 ops/s 100/100 rd/wr   1.0mb/s 1.111ms/op

$ sh script.sh

gawk: cmd. line:4: error: Unmatched ( or \(: /[0-9]+\.[0-9]+(?= ops/

Upvotes: 4

Views: 2365

Answers (3)

James Brown
James Brown

Reputation: 37454

Another awk with a for loop to slow it down instead of regex. It outputs the match and the record as would your while in the sample code. If I misassumed your intention, please update the expected output to the original question:

$ awk '{
    for(i=2;i<=NF;i++)        # loop from second field to the end
        if($i=="ops/s")       # if ith field is ops/s
            print $(i-1)      # print previous field
}1' file                      # output the record

Output:

1234.567
123.456: IO Summary: 123456 ops 1234.567 ops/s 100/100 rd/wr   1.0mb/s 1.111ms/op

Upvotes: 3

RavinderSingh13
RavinderSingh13

Reputation: 133680

With your shown samples could you please try following. This should match digit with and without floating points here.

awk '
match($0,/[0-9]+(\.[0-9]+)? +ops\/s/){
  val=substr($0,RSTART,RLENGTH)
  sub(/ .*/,"",val)
  print val
}
'  Input_file

Explanation: Adding detailed explanation for above.

awk '                                     ##Starting awk program from here.
match($0,/[0-9]+(\.[0-9]+)? +ops\/s/){    ##using match function to match regex [0-9]+(\.[0-9]+)? +ops\/s in current line.
  val=substr($0,RSTART,RLENGTH)           ##Creating val variable here which has sub string of matched regex from current line.
  sub(/ .*/,"",val)                       ##Substituting everything from space to till last with NULL in val here.
  print val                               ##Printing val here.
}
' Input_file                              ##Mentioning Input_file name here.

Upvotes: 5

anubhava
anubhava

Reputation: 785731

Regex in awk or gnu-awk don't support lookaheads. You can use this alternative gnu-awk command:

awk 'match($0, /([0-9]+\.[0-9]+) ops\/s/, m) {print m[1]}' file

1234.567

Here is POSIX compliant awk command to do the same:

awk 'match($0, /[0-9]+\.[0-9]+ ops\/s/) {
   print substr($0, RSTART, RLENGTH-6)}' file

1234.567

However if there can be multiple matches per line then use:

awk '{
   s = $0
   while (match(s, /([0-9]+\.[0-9]+) ops\/s/, m)) {
      print m[1]
      s = substr(s, RSTART + RLENGTH)
   }
}' file

Upvotes: 6

Related Questions