Reputation: 41
Could anyone tell me how can I match the start of <div>
tag to the end of </div>
tag with a regular expression in Ruby?
For example let say I have a:
<div>
<p>test content</p>
</div>
So far I have this:
< div [^>]* > [^<]*<\/div>
but it doesn't seems to work.
Upvotes: 0
Views: 2494
Reputation:
Nokogiri is great but, imho, there are situations when it can not be used.
For your mere case you can use this:
puts str.scan(/<div>(.*)<\/div>/im).flatten.first
<p>test content</p>
Upvotes: 2
Reputation: 160551
To match the <div>
when it's all on one line, use:
/<div[^>]*>/
But, that will break on any markup with a new-line inside the tag. It'll also break if there is whitespace between <
and div
, which there could be.
Eventually, after you've added in all the extra checks for the possible ways a tag can be written you'll want to consider a better way, which would be to use a parser, like Nokogiri, which makes working with HTML and XML much easier.
For instance, since you're trying to tear apart the HTML:
<div>
<p>test content</p>
</div>
it's pretty easy to guess you really want to get to "test content". What if the HTML changed to:
<div><p>test content</p></div>
or worse:
<div
><p>
test
content
</div>
A browser won't care, nor will a good parser, but a regex will get upset and require rework.
require 'nokogiri'
require 'pp'
doc = Nokogiri.HTML(<<EOT)
<div
><p>
test
content
</div>
EOT
pp doc.at('p').text.strip.gsub(/\s+/, ' ')
# => "test content"
That's why we recommend parsers.
Upvotes: 1
Reputation: 4088
An HTML parser such as Nokogiri would probably be a better option than using a Regex
as PinnyM pointed out.
Here is a tutorial on the Nokogiri page that describes how to search an HTML/XML document.
This stackoverflow question demonstrates something similar to what you want to accomplish using CSS selectors. Perhaps something like that would work for you.
Upvotes: 0