How to get only lines with a single quote using GNU sed in Bash shell?

Question

I'm writing a script to parse a text file (multiple lines). I need to print only lines matching the following pattern:

First character of the line is an Uppercase letter
Second character of the line is a lowercase letter OR a single quote
Third character of the line is a lowercase letter OR a space

Examples of "valid" lines

Abcd
A'cd
Ab c

Attemps with GNU sed 4.2.2 on Linux

I ] First attempt (escaping)

$ html2text foo.html | sed -r "/^([A-Z][a-z\'])/!d"

Produces the following error message:

html2text foo.html | sed -r "/^([A-Z][a-z\'])/date"

sed: -e expression n°1, character 19: extra characters after command

II ] Second attempt (no escaping)

$ html2text foo.html | sed -r "/^([A-Z][a-z'])/!d"

Produces the following error message:

html2text foo.html | sed -r "/^([A-Z][a-z'])/date"

sed: -e expression n°1, character 18: extra characters after command

I'm not quite sure how to deal with single quote "'" within a range. I know that escaping a single quote within a single-quoted sed expression is not supported at all, but here both sed expressions are double-quoted.

Weird thing is that error messages both return ".../date" (first line of error messages) which appear to be a bug or parsing issue ("/!d" flag is misinterpreted)...

Note: html2text convert 'foo.html' to text file. sed -r option stands for Extended regular expression. "[A-Z]" matches a range of characters (square square brackets are not literals here)

Thanks for your help

jineff · Accepted Answer

As pointed by casimir-et-hippolyte using grep is simpler here:

grep "^[A-Z][a-z'][a-z ]"

or using sed:

sed -n "/^[A-Z][a-z'][a-z ]/p"

How to get only lines with a single quote using GNU sed in Bash shell?

Examples of "valid" lines

Attemps with GNU sed 4.2.2 on Linux

I ] First attempt (escaping)

II ] Second attempt (no escaping)

Answers (2)

Related Questions