Reputation: 4251
I have the following string in a variable in Ruby.
"<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"
I want to extract only the src content, that is:
"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\"
In Ruby how can I extract the text from the src
attribute?
Upvotes: 2
Views: 2315
Reputation: 10454
This should do the trick...
test = '"<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"'
src = /src=\\"(.*?)\\"/.match(test)
puts src[1] # outputs /ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg
For the proper explanation:
1.) it looks for src=\"
(we have to escape \
with \\
).
2.) once it finds it, (.*?)
grabs everything until the next match.
3.) the next match is \"
(again we have to escape \
with \\
).
The .match
method of the Regexp class returns a hash. In this case the first index is the string you tested. The second index of your hash will contain your result.
OR if you don't like the looks of regular expressions and would much rather using css selectors, nokogiri's css method can help you.
require 'nokogiri'
test = "<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"
html = Nokogiri::HTML(test)
html.css("img").attribute('src').to_s # outputs /ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg
Upvotes: 4
Reputation: 110685
To extract the string between src
and style
I would use this:
text[/src=\"(.*)\"\sstyle=\"/,1]
#=> "/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg"
I've assumed you don't want the escaped double-quote at the end that you show in your desired output, but if you do, change the regex to:
/src=\"(.*)\sstyle=\"/
Upvotes: 1
Reputation: 118271
Using REXML::Document
require 'rexml/document'
doc = REXML::Document.new("<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n")
doc.get_elements('//p/img')[0].attribute('src').to_s
# => "/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg"
Upvotes: 3
Reputation: 4144
Use a html/xml parser, in ruby Nokogiri is a great choice. Example:
require 'nokogiri'
html = "<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"
doc = Nokogiri::HTML(html)
src = doc.xpath("//img")[0]['src']
In this example, xpath is used to extract all nodes, the first one is chosen, and then the 'src' attribute is returned as a string.
Upvotes: 4