Reputation: 47
I'm trying to parse through html source code. In my example I'm just echoing it in. But, I am reading html from a file in practice.
Here is a bit of code that works, syntactically:
echo "<td>Here</td> some dynamic text to ignore <garbage> is a string</table>more junk" |
awk -v FS="(<td>|</td>|<garbage>|</table>)" '{print $2, $4}'
in the FS declaration I create 4 delimiters which work fine, and I output the 2nd and 4th field.
However, the 3rd field delimeter I actually need to use contains awk command characters, literally:
')">
such that when I change the above statement to:
echo "<td>Here</td> some dynamic text to ignore ')\"> is a string</table>more junk" |
awk -v FS="(<td>|</td>|')\">|</table>)" '{print $2, $4}'
I've tried escaping one, all, and every combination of the offending string with the \character. but, nothing is working.
Upvotes: 0
Views: 263
Reputation: 203189
This might be what you're looking for:
$ echo "<td>Here</td> some dynamic text to ignore ')\"> is a string</table>more junk" |
awk -v FS='(<td>|</td>|\047\\)">|</table>)' '{print $2, $4}'
Here is a string
In shell, always include strings (and command line scripts) in single quotes unless you NEED to use double quotes to expose your strings contents to the shell, e.g. to let the shell expand a variable.
Per shell rules you cannot include a single quote within a single quote delimited string 'foo'bar'
though (no amount of backslashes will work to escape that mid-string '
) so you need to either jump back out of the single quotes to provide a single quote and then come back in, e.g. with 'foo'\''bar'
or use the octal escape sequence \047
(do not use the hex equivalent as it is error prone) wherever you want a single quote, e.g. 'foo\047bar'
. You then need to escape the )
twice - once for when awk converts the string to a regexp and then again when awk uses it as a regexp.
If you had been using double quotes around the string you'd have needed one additional escape for when shell parsed the string but that's not needed when you surround your string in single quotes since that is blocking the shell from parsing the string.
Upvotes: 2