Reputation: 125
I would like to extract "toast" from a string <h1>test</h1><div>toast</div>
. What regular expression could isolate such a string?
Edit: Thanks to the user who who corrected the formatting.
More Info: There will always only be one instance of the div tag, the information inside may change but there will never be another div tag in the same string (the string is larger than the given sample)
Thanks!
Upvotes: 1
Views: 6985
Reputation: 19066
This is really not something that is typically done with regex... and for a good reason, but if you must and since you said there will never be more than a single div within it... this should work for you:
(?<=<div>).*(?=</div>)
Upvotes: 1
Reputation: 13054
We need more information. If the string is exactly "<h1>test</h1><div>toast</div>"
, then something naïve like
regex = /<h1>test<\/h1><div>([^<]*)<\/div>/
found = "<h1>test</h1><div>toast</div>".match(regex)[1]
# => "toast"
would work. My best guess at this point is that you are expecting
<h1>*</h1><div>*</div>
then use this:
regex = /<h1>[^<]*<\/h1><div>([^<]*)<\/div>/
found = "<h1>any string can go here</h1><div>toast</div>".match(regex)[1]
# => "toast"
Note that this breaks if there are any nested elements in either tag. A more robust solution is to use Nokogiri. Talk to your boss.
Upvotes: 1
Reputation: 118261
You can use Nokogiri
.
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse("<div> test </div> <div> toast </div>")
doc.css('div').map(&:text)
# => [" test ", " toast "]
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse("<h1>test</h1><div>toast</div>")
doc.at_css('div').text
# => "toast"
Upvotes: 6