Reputation: 1295
I'm having a difficult time locating an HTML parser that works with JRuby.
I've become fond of using Nokogiri for HTML parsing, but Nokogiri requires the use of bxml2.dll, which I don't have available on my machine and am not sure that I can ensure that it is available on all users' machines.
I attempted to use another favorite, Scrubyt, but that relies on Mechanize, which also requires Nokogiri.
What Ruby HTML parser do you recommend for use with JRuby?
Upvotes: 1
Views: 319
Reputation: 37507
THe pure java version of Nokogiri does not depend on libxml2 or any binary. See http://wiki.github.com/tenderlove/nokogiri/pure-java-nokogiri-for-jruby.
Hpricot is a popular HTML parsing library that has a pure java port as well. The functionality is similar, in fact Hpricot was the parser that popularized using CSS selectors for HTML parsing.
Upvotes: 1
Reputation: 14222
Why not use the pure-java version of nokogiri?
http://github.com/tenderlove/nokogiri/tree/java
Upvotes: 0