user486174
user486174

Reputation: 41

matching <div></div> tag with regular expression in ruby

Could anyone tell me how can I match the start of <div> tag to the end of </div> tag with a regular expression in Ruby?

For example let say I have a:

<div>
<p>test content</p>
</div>

So far I have this:

< div [^>]* > [^<]*<\/div>

but it doesn't seems to work.

Upvotes: 0

Views: 2494

Answers (3)

user904990
user904990

Reputation:

Nokogiri is great but, imho, there are situations when it can not be used.

For your mere case you can use this:

puts str.scan(/<div>(.*)<\/div>/im).flatten.first

<p>test content</p>

Upvotes: 2

the Tin Man
the Tin Man

Reputation: 160551

To match the <div> when it's all on one line, use:

/<div[^>]*>/

But, that will break on any markup with a new-line inside the tag. It'll also break if there is whitespace between < and div, which there could be.

Eventually, after you've added in all the extra checks for the possible ways a tag can be written you'll want to consider a better way, which would be to use a parser, like Nokogiri, which makes working with HTML and XML much easier.

For instance, since you're trying to tear apart the HTML:

<div>
<p>test content</p>
</div>

it's pretty easy to guess you really want to get to "test content". What if the HTML changed to:

<div><p>test content</p></div>

or worse:

<div
><p>
test
content
</div>

A browser won't care, nor will a good parser, but a regex will get upset and require rework.

require 'nokogiri'
require 'pp'

doc = Nokogiri.HTML(<<EOT)
    <div
    ><p>
    test
    content
    </div>
EOT
pp doc.at('p').text.strip.gsub(/\s+/, ' ')
# => "test content"

That's why we recommend parsers.

Upvotes: 1

Zajn
Zajn

Reputation: 4088

An HTML parser such as Nokogiri would probably be a better option than using a Regex as PinnyM pointed out.

Here is a tutorial on the Nokogiri page that describes how to search an HTML/XML document.

This stackoverflow question demonstrates something similar to what you want to accomplish using CSS selectors. Perhaps something like that would work for you.

Upvotes: 0

Related Questions