Swapnesh Khare
Swapnesh Khare

Reputation: 109

Web scraping behance.net

I am trying to parse https://www.behance.net/gallery/35092257/LEmpreinte-du-Geste using Nokogiri.

The parsed page does not include all the META tags in the <head> that I can see by viewing the page source on that page. Any idea why this is so?

This is the code:

require 'nokogiri'
require 'open-uri'

url = 'https://www.behance.net/gallery/35092257/LEmpreinte-du-Geste'
doc = Nokogiri::HTML(open(url))
puts doc

Upvotes: 0

Views: 757

Answers (1)

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121000

This page is built on top of RequireJS, that builds a shadow dom on the fly.

Nokogiri is HTML/XML parser, it is by no means a JavaScript parser. Why do you expect it to execute JavaScript?

wget the page and you’ll see that in fact it contains not one line of HTML, besides tags like <html> and <head>.

You might try to pass the downloaded page to Node, but I doubt it will be able to execute either.

Upvotes: 1

Related Questions