Reputation: 11244
I have this below string from which I want to extract class values "ruby", "html", "java". My objective here is understanding / learning regular expressions that I have always dreaded :-).
<div class="ruby" name="ruby_doc">
<div class="html" name="html_doc">
<div class="java" name="java_doc">
This is what I have so far
str = <<END
<div class="ruby" name="ruby_doc">
<div class="html" name="html_doc">
<div class="java" name="java_doc">
END
str.scan(/"[^"]+/) #=> returns
["\"ruby", "\" name=", "\"ruby_doc", "\">\n<div class=", "\"html",...]
str.scan(/class="[^"]+/) #=> ["class=\"ruby", "class=\"html", "class=\"java"]
str.scan(/"(\w)+?"/) #=> [["ruby"], ["ruby_doc"], ["html"], ["html_doc"], ...]
Upvotes: 1
Views: 1472
Reputation: 54984
Howsabout:
str.scan /"(.*?)"/
#=> [["ruby"], ["ruby_doc"], ["html"], ["html_doc"], ["java"], ["java_doc"]]
Upvotes: -3
Reputation: 168091
str.scan(/\b(?<=class=\")[^"]+(?=\")/)
# => ["ruby", "html", "java"]
Upvotes: 7
Reputation: 29291
You really should use Nokogiri as per @Arup's answer. But, if you insist...
str.scan(/(?:class\=\")(\w+)(?:\")/).flatten
2.0.0p247 :001 > str = <<END
2.0.0p247 :002"> <div class="ruby" name="ruby_doc">
2.0.0p247 :003"> <div class="html" name="html_doc">
2.0.0p247 :004"> <div class="java" name="java_doc">
2.0.0p247 :005"> END
=> "<div class=\"ruby\" name=\"ruby_doc\">\n<div class=\"html\" name=\"html_doc\">\n<div class=\"java\" name=\"java_doc\">\n"
2.0.0p247 :006 > str.scan(/(?:class\=\")(\w+)(?:\")/).flatten
=> ["ruby", "html", "java"]
Upvotes: 1
Reputation: 2717
Parsing HTML with regex is not recommended. If you had to write a somewhat ok regex, then you could try with
str.scan /<div\s+class=\s*"([^"]+)/
#=> [["ruby"], ["html"], ["java"]]
Upvotes: 2
Reputation: 118271
Use Nokogiri
for this :
require 'nokogiri'
doc = Nokogiri::HTML::Document.parse <<-_html_
<div class="ruby" name="ruby_doc">
<div class="html" name="html_doc">
<div class="java" name="java_doc">
_html_
# to get values of class attribute
doc.xpath('//div/@class').map(&:to_s)
# => ["ruby", "html", "java"]
# to get values of name attribute
doc.xpath('//div/@name').map(&:to_s)
# => ["ruby_doc", "html_doc", "java_doc"]
Upvotes: 3