Gonzalo
Gonzalo

Reputation:

Can I get html elements with nokogiri?

I have a doubt about nokogiri, I need to get the HTML elements from a page, and get the xpath for each one. The problem is that I can't realize how to do it with nokogiri. The HTML code is random, because I've to parse several pages, from different websites.

Upvotes: 1

Views: 3043

Answers (2)

sutch
sutch

Reputation: 1295

If you are asking how to get the XPath for each HTML element in a page, then the following should help. This will open and parse a page and then print out the XPath for each element.

require 'rubygems'
require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://slashdot.com/"))
doc.traverse {|node| puts node.path }

Upvotes: 0

Mike Dalessio
Mike Dalessio

Reputation: 1350

If you are asking how to search for a node, you may use either CSS or XPath expressions, like so:

require 'rubygems'
require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML(open("http://slashdot.com/"))

node_found_by_css = doc.css("h1").first
node_found_by_xpath = doc.xpath("/html/body//h1").first

If you are asking how, once you've found a node, you can retrieve the canonical XPath expression for it, you may use Node#path like so:

puts node_found_by_css.path # => "/html/body/div[3]/div[1]/div[1]/h1"

Upvotes: 5

Related Questions