g0lem
g0lem

Reputation: 657

How to get only lines with a single quote using GNU sed in Bash shell?

I'm writing a script to parse a text file (multiple lines). I need to print only lines matching the following pattern:

  1. First character of the line is an Uppercase letter
  2. Second character of the line is a lowercase letter OR a single quote
  3. Third character of the line is a lowercase letter OR a space

Examples of "valid" lines

Attemps with GNU sed 4.2.2 on Linux

I ] First attempt (escaping)

$ html2text foo.html | sed -r "/^([A-Z][a-z\'])/!d"

Produces the following error message:

html2text foo.html | sed -r "/^([A-Z][a-z\'])/date"

sed: -e expression n°1, character 19: extra characters after command

II ] Second attempt (no escaping)

$ html2text foo.html | sed -r "/^([A-Z][a-z'])/!d"

Produces the following error message:

html2text foo.html | sed -r "/^([A-Z][a-z'])/date"

sed: -e expression n°1, character 18: extra characters after command

I'm not quite sure how to deal with single quote "'" within a range. I know that escaping a single quote within a single-quoted sed expression is not supported at all, but here both sed expressions are double-quoted.

Weird thing is that error messages both return ".../date" (first line of error messages) which appear to be a bug or parsing issue ("/!d" flag is misinterpreted)...

Note: html2text convert 'foo.html' to text file. sed -r option stands for Extended regular expression. "[A-Z]" matches a range of characters (square square brackets are not literals here)

Thanks for your help

Upvotes: 0

Views: 136

Answers (2)

jineff
jineff

Reputation: 482

As pointed by casimir-et-hippolyte using grep is simpler here:

grep "^[A-Z][a-z'][a-z ]"

or using sed:

sed -n "/^[A-Z][a-z'][a-z ]/p"

Upvotes: 1

karakfa
karakfa

Reputation: 67527

if you need to have single quotes for some reason, this can be used to escape the single quote in the script

sed -n '/^[A-Z][a-z'"'"'][a-z ]/p'

Upvotes: 1

Related Questions