Paul
Paul

Reputation: 26640

Remove specific tag with its contents using sed

I would like to remove following tag from HTML including its constantly varying contents:

<span class="the_class_name">li4tuq734g23r74r7Whatever</span>

A following BASH script

.... | sed -e :a -re 's/<span class="the_class_name"/>.*</span>//g' > "$NewFile"

ends with error

sed: -e expression #2, char XX: unknown option to `s'

I tried to escape quotes, slashes and "less than" symbols in various combinations and still get this error.

Upvotes: 1

Views: 470

Answers (1)

Ted Lyngmo
Ted Lyngmo

Reputation: 117288

I suggest using a different separator than / when / is contained within the thing you want to match on. Also, prefer -E instead of -r for extended regex to be Posix compatible. Also note that you have a / in your first span in your regex that doesn't belong there. Also, .* will make it overly greedy and eat up any </span> that follows the first </span> on the line. It's better to match on [^<]*. That is, any character that is not <.

sed -E 's,<span class="the_class_name">[^<]*</span>,,g'

A better option is of course to use a HTML parser for this.

Upvotes: 3

Related Questions