rj487
rj487

Reputation: 4634

How to extract all attributes from img tag

I was trying to use Nokogiri to turn:

<img class="img-responsive" src="img/logologo.png" alt=""> 

to:

<%= image_tag('img/logologo.png', :class => 'img-responsive', :alt => '') %>

Here is my code:

# a = <img class="img-responsive" src="img/logologo.png" alt="" width="256" height="256"> 
page = Nokogiri::HTML(a)
img = page.css('img')[0]
src =  ""
alt =  ""
class_atr = ""
src =  img['src'] if img['src'].present?
alt =  img['alt'] if img['alt'].present?
class_atr = img['class'] if img['class'].present?
result = "<%= image_tag(\'" + src + '\', :class => \'' + class_atr + '\', :alt => \'' + alt + '\')%>'

This is kind of like hard code, is there a way I can extract all attributes and its src?

The image tag might contain height or width parameters. How do I extract all attributes automatically and make them into ERB?

Upvotes: 1

Views: 1400

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

OK, there are lots of things to work on. Let's start with how you're parsing the HTML. If all you're doing is parsing a snippet or single tag, you can use DocumentFragment to tell Nokogiri to not add the usual HTML tags:

require 'nokogiri'
doc = Nokogiri::HTML('<img class="img-responsive" src="img/logologo.png" alt="">')
doc.to_html # => "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\">\n<html><body><img class=\"img-responsive\" src=\"img/logologo.png\" alt=\"\"></body></html>\n"

Instead, you can do:

doc = Nokogiri::HTML::DocumentFragment.parse('<img class="img-responsive" src="img/logologo.png" alt="">')
doc.to_html # => "<img class=\"img-responsive\" src=\"img/logologo.png\" alt=\"\">"

Next, don't use css, xpath or search when you mean at, at_css or at_xpath. Meditate on this:

doc.css('img').class # => Nokogiri::XML::NodeSet
doc.at('img').class # => Nokogiri::XML::Element

doc.css('img')[0].to_html # => "<img class=\"img-responsive\" src=\"img/logologo.png\" alt=\"\">"
doc.css('img').first.to_html # => "<img class=\"img-responsive\" src=\"img/logologo.png\" alt=\"\">"
doc.at('img').to_html # => "<img class=\"img-responsive\" src=\"img/logologo.png\" alt=\"\">"

That css, xpath and search return a NodeSet is significant and something to remember. at and its variants are equivalent to using first or [0] on the returned NodeSet, returning the first node, so use at and friends if that's what you mean as it results in code that's not as noisy.

Here's how I'd go about it:

require 'nokogiri'
doc = Nokogiri::HTML::DocumentFragment.parse('<img class="img-responsive" src="img/logologo.png" alt="">')

img = doc.at('img')
img_src = img.delete('src')
img_params = img.map { |p, v| ":%s => '%s'" % [p, v] }.join(', ') 
# => ":class => 'img-responsive', :alt => ''"

img_template = "<%%= image_tag('%s', %s) %%>" % [img_src, img_params]  
# => "<%= image_tag('img/logologo.png', :class => 'img-responsive', :alt => '') %>"

Of course, using :k => "v" format for key/values is old-school. I'd recommend changing to:

img_params = img.map { |p, v| "%s: '%s'" % [p, v] }.join(', ') # => "class: 'img-responsive', alt: ''"

which results in:

"<%= image_tag('img/logologo.png', class: 'img-responsive', alt: '') %>"

Upvotes: 0

vlasiak
vlasiak

Reputation: 348

Use following code to iterate over all <img> tags inside the HTML markup and get their attributes:

page = Nokogiri::HTML <<-html
    <img class="img-responsive1" src="img/logologo.png" alt="" width="256" height="256">
    <a href="#">A tag</a>
    <img class="img-responsive2" src="logologo222.png">
html

page.css('img').each do |img_node|
    img_attributes = img_node.attributes.values # list of image attributes

    # e.g., to output key-value pairs:
    img_attributes.each do |attr|
        p [attr.name, attr.value]
    end
end

Upvotes: 2

Related Questions