Julien Rouvier
Julien Rouvier

Reputation: 316

Shell script using sed to remove HTML tag containing specific text

I am looking for a way to delete (with sed if possible) an html tag containing a specific word. For instance, delete every div tag containing the word foo. The divs can of course contain multiple lines. For instance :

<body>
    <div>
        This div will be <i>deleted</i>.
        Why ?
        Because it contains foo.
    </div>

    <div>
        This div doesn't contains the forbidden word.
        <b>So it won't be deleted.</b>
    </div>
</body>

I found ways to delete html tags, but nothing about tags containing a specific text. Thanks !

Upvotes: 0

Views: 883

Answers (1)

iptable
iptable

Reputation: 21

It is not possible with sed alone. Sed is a single-line processor. If you want a script using sed/bash/grep, you would need to create a parser that will parse div contents and only print the divs that don't contain the text you wanted. Seriously, look for a html parser instead.

Upvotes: 2

Related Questions