Reputation: 849
I'm totally a regular expression newbie and I think the problem of my code lies in the regular expression I use in match function of awk.
#!/bin/bash
...
line=$(sed -n '167p' models.html)
echo "line: $line"
cc=$(awk -v regex="[0-9]" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH+1); print pattern_match}')
echo "cc: $cc"
The result is:
line: <td><center>0.97</center></td>
cc:
In fact, I want to extract the numerical value 0.97 into variable cc.
Upvotes: 1
Views: 609
Reputation: 10865
Three things:
You need to pass the value of line
into awk with -v
:
awk -v line="$line" ...
Your regular expression only matches a single digit. To match a float, you want something like
[0-9]+\.[0-9]+
No need to add 1 to the match length for the substring
substr(line, RSTART, RLENGTH)
Putting it all together:
line='<td><center>0.97</center></td>'
echo "line: $line"
cc=$(awk -v line="$line" -v regex="[0-9]+\.[0-9]+" 'BEGIN { match(line, regex); pattern_match=substr(line, RSTART, RLENGTH); print pattern_match}')
echo "cc: $cc"
Result:
line: <td><center>0.97</center></td>
cc: 0.97
Upvotes: 1
Reputation: 74615
$line
to awk, otherwise it cannot be used within the script..
as well as the digits, you'll have to add that to your regular expression.Try something like this:
cc=$(awk 'NR == 167 && match($0, /[0-9.]+/) { print substr($0, RSTART, RLENGTH) }' models.html)
Upvotes: 2