user8640638
user8640638

Reputation:

Why is Nokogiri not finding this img src?

I want get image from this Url :

doc_autobip = Nokogiri::HTML(URI.open('https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'))

The img tag is :

<img src="https://www.autobip.com/storage/photos/articles/16757/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz_2020-08-12-09-1087474.jpg" class="fotorama__img">

Logically this can be useful

src_img = article.css('img.fotorama__img').map { |link| link['src'] }

But i have alwayse src_img = [] !!

any ideas, please

Upvotes: 0

Views: 212

Answers (1)

Kumar
Kumar

Reputation: 3126

The html class fotorama__img is being added to the image dynamically. Although you can see it when you inspect the page, you cannot find the fotorama__img class on it when you View Source of the page.

Nokogiri, gets the source of the website & doesn't wait for the javascript on the page to execute.

You can try something like this, which should work

doc_autobip = Nokogiri::HTML(URI.open('https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'))
# the div wrapping the image has the classes "fotorama mnmd-gallery-slider mnmd-post-media-wide" 
doc_autobip.css('.fotorama.mnmd-gallery-slider.mnmd-post-media-wide img').map { |link| link['src'] }

This is just to show it works. You can choose wisely which element & classes to use to make it work.

Update:

Or if you want the content of the page to load you can use watir

require 'nokogiri'
require 'watir'

browser = Watir::Browser.new
browser.goto 'https://www.autobip.com/fr/actualite/sappl_mercedes_benz_livraison_de_282_camions_mercedes_benz/16757'

doc = Nokogiri::HTML.parse(browser.html)
doc.css('img.fotorama__img').map { |link| link['src'] }

But you'll need to install additional drivers to use watir fyi.

Upvotes: 1

Related Questions