Reputation: 109
I am trying to parse https://www.behance.net/gallery/35092257/LEmpreinte-du-Geste using Nokogiri.
The parsed page does not include all the META tags in the <head>
that I can see by viewing the page source on that page. Any idea why this is so?
This is the code:
require 'nokogiri'
require 'open-uri'
url = 'https://www.behance.net/gallery/35092257/LEmpreinte-du-Geste'
doc = Nokogiri::HTML(open(url))
puts doc
Upvotes: 0
Views: 757
Reputation: 121000
This page is built on top of RequireJS, that builds a shadow dom on the fly.
Nokogiri is HTML/XML parser, it is by no means a JavaScript parser. Why do you expect it to execute JavaScript?
wget
the page and you’ll see that in fact it contains not one line of HTML, besides tags like <html>
and <head>
.
You might try to pass the downloaded page to Node, but I doubt it will be able to execute either.
Upvotes: 1