Reputation: 5132
I am trying to parse some HTML using Nokogiri and am having some issues. I want to go through each "employerReview" class and capture content under the "pros" and "cons".
I am having trouble just doing the first part: getting one item to return to the console.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm'))
doc.css('//*[@id="empReview_2320868"]/div[1]/p[1]/tt').each do |link|
puts link.content
end
Upvotes: 2
Views: 272
Reputation: 14082
You've passed xpath
to a css
selector.
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm'))
ps = doc.xpath('//div[@class="employerReview"]//div[@class="description"]/p[position()<3]')
ps.map{|p| p.text.strip}.each_slice(2) do |pros, cons|
puts pros
puts cons
end
The xpath specified has included the Pros -
and Cons -
part, if that's not what you want, you can change the xpath to be
//div[@class="employerReview"]//div[@class="description"]/p[position()<3]/tt
Upvotes: 0
Reputation: 7583
Here is one way to get closer to finding the data you are looking for by using CSS, instead of XPath:
require 'open-uri'
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm'))
doc.css('div.employerReview > div.description > p > strong').each do |item|
puts item.content
item.parent.css('tt').each do |details|
puts details.content
end
end
Upvotes: 0
Reputation: 160551
One problem is you're using an XPath accessor for a method that expects CSS:
doc.css('//*[@id="empReview_2320868"]/div[1]/p[1]/tt')
You can use search
or xpath
for XPaths instead.
That doesn't find the nodes you want though. A simple test shows they don't exist:
doc.css("#empReview_2320868")
should return something, but it returns []
, meaning that ID doesn't exist in any tags.
Upvotes: 1