Ajay
Ajay

Reputation: 4251

How to extract the img src content from a text in ruby

I have the following string in a variable in Ruby.

"<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"

I want to extract only the src content, that is:

"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" 

In Ruby how can I extract the text from the src attribute?

Upvotes: 2

Views: 2315

Answers (4)

Seth
Seth

Reputation: 10454

This should do the trick...

test = '"<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"'
src  = /src=\\"(.*?)\\"/.match(test)

puts src[1] # outputs /ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg

For the proper explanation:

1.) it looks for src=\" (we have to escape \ with \\).

2.) once it finds it, (.*?) grabs everything until the next match.

3.) the next match is \" (again we have to escape \ with \\).



The .match method of the Regexp class returns a hash. In this case the first index is the string you tested. The second index of your hash will contain your result.



OR if you don't like the looks of regular expressions and would much rather using css selectors, nokogiri's css method can help you.

require 'nokogiri'

test = "<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"

html = Nokogiri::HTML(test)
html.css("img").attribute('src').to_s # outputs /ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg

Upvotes: 4

Cary Swoveland
Cary Swoveland

Reputation: 110685

To extract the string between src and style I would use this:

text[/src=\"(.*)\"\sstyle=\"/,1]
  #=> "/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg" 

I've assumed you don't want the escaped double-quote at the end that you show in your desired output, but if you do, change the regex to:

/src=\"(.*)\sstyle=\"/

Upvotes: 1

Arup Rakshit
Arup Rakshit

Reputation: 118271

Using REXML::Document

require 'rexml/document'

doc = REXML::Document.new("<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n") 
doc.get_elements('//p/img')[0].attribute('src').to_s
# => "/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg"

Upvotes: 3

Tim Peters
Tim Peters

Reputation: 4144

Use a html/xml parser, in ruby Nokogiri is a great choice. Example:

require 'nokogiri'
html = "<p><img alt=\"\" src=\"/ckeditor_assets/pictures/35/content_raw_lemon_cheesecake.jpg\" style=\"height:533px; width:800px\" /></p>\r\n"
doc = Nokogiri::HTML(html)
src = doc.xpath("//img")[0]['src']

In this example, xpath is used to extract all nodes, the first one is chosen, and then the 'src' attribute is returned as a string.

Upvotes: 4

Related Questions