Reputation: 657
I'm writing a script to parse a text file (multiple lines). I need to print only lines matching the following pattern:
$ html2text foo.html | sed -r "/^([A-Z][a-z\'])/!d"
Produces the following error message:
html2text foo.html | sed -r "/^([A-Z][a-z\'])/date"
sed: -e expression n°1, character 19: extra characters after command
$ html2text foo.html | sed -r "/^([A-Z][a-z'])/!d"
Produces the following error message:
html2text foo.html | sed -r "/^([A-Z][a-z'])/date"
sed: -e expression n°1, character 18: extra characters after command
I'm not quite sure how to deal with single quote "'" within a range. I know that escaping a single quote within a single-quoted sed expression is not supported at all, but here both sed expressions are double-quoted.
Weird thing is that error messages both return ".../date" (first line of error messages) which appear to be a bug or parsing issue ("/!d" flag is misinterpreted)...
Note: html2text convert 'foo.html' to text file. sed -r option stands for Extended regular expression. "[A-Z]" matches a range of characters (square square brackets are not literals here)
Thanks for your help
Upvotes: 0
Views: 136
Reputation: 482
As pointed by casimir-et-hippolyte using grep is simpler here:
grep "^[A-Z][a-z'][a-z ]"
or using sed:
sed -n "/^[A-Z][a-z'][a-z ]/p"
Upvotes: 1
Reputation: 67527
if you need to have single quotes for some reason, this can be used to escape the single quote in the script
sed -n '/^[A-Z][a-z'"'"'][a-z ]/p'
Upvotes: 1