Albz
Albz

Reputation: 2030

How to match content between HTML specific tags with attribute using grep?

Which regular expression should I use with the command grep if I wanted to match the text contained within the tag <div class="Message"> and its closing tag </div> in an HTML file?

Upvotes: 18

Views: 48637

Answers (3)

Andy Lester
Andy Lester

Reputation: 93636

You can't do it reliably with just grep. You need to parse the HTML with an HTML parser.

What if the HTML code has something like:

<!--
<div class="Message">blah blah</div>
-->

You'll get a false hit on that commented-out code. Here are some other examples where a regex-only option will fail you.

Consider using xmlgrep from the XML::Grep Perl module, as discussed here: Extract Title of a html file using grep

Upvotes: 3

sampson-chen
sampson-chen

Reputation: 47267

You can do that by specifying a regex:

grep -E "^<div class=\"Message\">.*</div>$" input_files

Not that this will only print the enclosures found on the same line. If your tag spans multiple lines, you can try:

tr '\n' ' ' < input_file | grep -E "^<div class=\"Message\">.*</div>$"

Upvotes: 5

Steve
Steve

Reputation: 54392

Here's one way using GNU grep:

grep -oP '(?<=<div class="Message"> ).*?(?= </div>)' file

If your tags span multiple lines, try:

< file tr -d '\n' | grep -oP '(?<=<div class="Message"> ).*?(?= </div>)'

Upvotes: 17

Related Questions