Reputation: 309
It might just be an idiotic bug in the code that I haven't yet discovered, but it's been taking me quite some time: When parsing websites using nokogiri and xpath, and trying to save the content of the xpaths to a .csv file, the csv file has empty cells.
Basically, the content of the xpath returns empty OR my code doesn't properly read the websites.
This is what I'm doing:
require 'open-uri'
require 'nokogiri'
require 'csv'
CSV.open("neverend.csv", "w") do |csv|
csv << ["kuk","date","name"]
#first, open the urls from a document. The urls are correct.
File.foreach("neverendurls.txt") do |line|
#second, the loop for each url
searchablefile = Nokogiri::HTML(open(line))
#third, the xpaths. These work when I try them on the website.
kuk = searchablefile.at_xpath("(//tbody/tr/td[contains(@style,'60px')])[1]")
date = searchablefile.at_xpath("(//tbody/tr/td[contains(@style,'60px')])[1]/following-sibling::*[1]")
name = searchablefile.at_xpath("(//tbody/tr/td[contains(@style, '60px')])[1]/following-sibling::*[2]")
#fourth, saving the xpaths
csv << [kuk,date,name]
end
end
what am I missing here?
Upvotes: 0
Views: 54
Reputation: 55002
It's impossible to tell from what you posted, but let's clean that hot mess up with css:
kuk = searchablefile.at 'td[style*=60px]'
date = searchablefile.at 'td[style*=60px] + *'
name = searchablefile.at 'td[style*=60px] + * + *'
Upvotes: 1