dudu
dudu

Reputation: 849

Regular expression in awk in bash shell script

I'm totally a regular expression newbie and I think the problem of my code lies in the regular expression I use in match function of awk.

#!/bin/bash
...
line=$(sed -n '167p' models.html)
echo "line: $line"
cc=$(awk -v regex="[0-9]" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH+1); print pattern_match}')
echo "cc: $cc"

The result is:

line:  <td><center>0.97</center></td>
cc: 

In fact, I want to extract the numerical value 0.97 into variable cc.

Upvotes: 1

Views: 609

Answers (2)

jas
jas

Reputation: 10865

Three things:

You need to pass the value of line into awk with -v:

awk -v line="$line" ...

Your regular expression only matches a single digit. To match a float, you want something like

[0-9]+\.[0-9]+

No need to add 1 to the match length for the substring

substr(line, RSTART, RLENGTH)

Putting it all together:

line='<td><center>0.97</center></td>'
echo "line: $line"
cc=$(awk -v line="$line" -v regex="[0-9]+\.[0-9]+" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH); print pattern_match}')
echo "cc: $cc"

Result:

line: <td><center>0.97</center></td>
cc: 0.97

Upvotes: 1

Tom Fenech
Tom Fenech

Reputation: 74615

  • You need to pass your shell variable $line to awk, otherwise it cannot be used within the script.
  • Alternatively, you can just read the file using awk (no need to involve sed at all).
  • If you want to match the . as well as the digits, you'll have to add that to your regular expression.

Try something like this:

cc=$(awk 'NR == 167 && match($0, /[0-9.]+/) { print substr($0, RSTART, RLENGTH) }' models.html)

Upvotes: 2

Related Questions