Nokogiri find image src

Question

I need to get the avatar's src attribute from Facebook.

doc = Nokogiri::HTML(open('http://www.facebook.com/zuck'))

Then I tried:

 avatar = doc.css('.photoContainer img')

But received an empty result. What should I do to get the img src? And why didn't my method work?

I also tried to find all imgs by XPath, but still received empty results:

Nokogiri::HTML(open('http://www.facebook.com/zuck')).xpath("//img/@src").each do |src|
  puts src                                                  
end

Chris Salzberg · Accepted Answer

The problem is that the .photoContainer div that you're trying to access is not in the actual HTML for the page, it is inserted into the DOM via JavaScript so Nokogiri can't see it. Nokogiri can only parse static HTML and XML.

If you want to access the DOM content generated by JavaScript, you might want to try an automated web browsing tool like watir or selenium. Also see "Nokogiri parse ajax-loaded content".

UPDATE:

If you're familiar with integration testing using capybara, you can also use its selectors as a wrapper for a browsing tool like selenium, which can be a bit tricky to use directly.

So, for example, in a console:

require 'capybara'
require 'capybara/dsl'

include Capybara::DSL
Capybara.default_driver = :selenium

Then you can get the element, first by closing the pop-up, and then accessing the element via CSS:

visit('http://www.facebook.com/zuck')
find('a.layerCancel').click
find('.photoContainer img')['src']
#=> "http://profile.ak.fbcdn.net/hprofile-ak-ash3/c23.1.285.285/s160x160/73273_773684942011_2125564_n.jpg"

Nokogiri find image src

Answers (1)

Related Questions