Harish Kurup
Harish Kurup

Reputation: 7505

Method to parse HTML document in Ruby?

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

Upvotes: 32

Views: 36350

Answers (4)

dineshsprabu
dineshsprabu

Reputation: 165

Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

Follow the link for a simple crawler example.

gem install ruby-cheerio

require 'ruby-cheerio'

jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")

jQuery.find('h1').each do |head_one|
    p head_one.text
end

# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')

# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text

Upvotes: 6

microspino
microspino

Reputation: 7781

You can also try Oga by Yorick Peterse.

It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

Upvotes: 5

Marc-Andr&#233; Lafortune
Marc-Andr&#233; Lafortune

Reputation: 79552

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

Upvotes: 49

Peter
Peter

Reputation: 132157

You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

Upvotes: 9

Related Questions