Reputation: 361
Suppose I have a html input like
<li>this is a html input line</li>
I want to filter all such input lines from a file which begins with <li>
and ends with </li>
. Now my idea was to search for pattern <li>
in the first field and pattern </li>
in the last field using the below awk command
awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'
but looks like there is no provision to match two fields at a time or I am making some syntax mistakes. Could you please help me here?
PS: I am working on a Solaris SunOS machine.
Upvotes: 1
Views: 663
Reputation: 14490
Why not just use a regex to match the start and end of the line like
awk '/^[[:space:]]*<li>.*<\/li>[[:space:]]*$/ {print}'
though in general if you're trying to process HTML you'll be better of using a tool that's really designed to handle that.
Upvotes: 1
Reputation: 203229
There's a lot going wrong in your script on Solaris:
awk '$1 ~ /\<li\>/ ; $NF ~ /\</li\>/ {print $0}'
/usr/xpg4/bin/awk
. There's also nawk
but it's got less POSIX features (eg. no support for character classes).\<...\>
are gawk-specific word boundaries. There is no awk on Solaris that would recognize those. If you were just trying to get literal characters then there's no need to escape them as they are not regexp metacharacters.&&
between them, not ;
which is just the statement terminator in lieu of a newline.{print $0}
so you don't need to explicitly write that code./
is the awk regexp delimiter so you do need to escape that in mid-regexp.$1
and $NF
will be <li>this
and line</li>
, not <li>
and </li>
.So if you DID for some reason compare multiple fields you could do:
awk '($1 ~ /^<li>.*/) && ($NF ~ /.*<\/li>$/)'
but this is probably what you really want:
awk '/^<li>.*<\/li>/'
in which case you could just use grep:
grep '^<li>.*</li>'
Upvotes: 3