Reputation: 7525

Method to parse HTML document in Ruby?

like DOMDocument class in PHP, is there any class in RUBY (i.e the core RUBY), to parse and get node elements value from a HTML Document.

Upvotes: 32

Answers (4)

dineshsprabu

Reputation: 165

Ruby Cheerio - A jQuery style HTML parser in ruby. A most simplified version of Nokogiri for crawlers. This is the ruby version of most popular NodeJS package cheerio.

Follow the link for a simple crawler example.

gem install ruby-cheerio

require 'ruby-cheerio'

jQuery = RubyCheerio.new("<html><body><h1 class='one'>h1_1</h1><h1>h1_2</h1></body></html>")

jQuery.find('h1').each do |head_one|
    p head_one.text
end

# getting attribute values like jQuery.
p jQuery.find('h1.one')[0].prop('h1','class')

# function chaining similar to jQuery.
p jQuery.find('body').find('h1').first.text

Upvotes: 6

microspino

Reputation: 7781

You can also try Oga by Yorick Peterse.

It is an XML/HTML parser written in Ruby that does not require system libraries such as libxml. You can find it here. https://github.com/YorickPeterse/oga

Upvotes: 5

Marc-André Lafortune

Reputation: 79622

There is no built-in HTML parser (yet), but some very good ones are available, in particular Nokogiri.

Meta-answer: For common needs like these, I'd recommend checking out the Ruby Toolbox site. You'll notice that Nokogiri is the top recommendation for HTML parsers

Upvotes: 49

Peter

Reputation: 132387

You should check out hpricot. It's exceedingly good. It's not 'core' ruby, but it's a commonly used gem.

Upvotes: 9

Method to parse HTML document in Ruby?

Answers (4)

Related Questions