Uzzar
Uzzar

Reputation: 703

Nokogiri and XPath: saving text result of scrape

I would like to save the text results of a scrape in a file. This is my current code:

require "rubygems"
require "open-uri"
require "nokogiri"

class Scrapper
  attr_accessor :html, :single

  def initialize(url)
    download = open(url)
    @page = Nokogiri::HTML(download)
    @html = @page.xpath('//div[@class = "quoteText"andfollowing-sibling::div[1][@class = "quoteFooter" and .//a[@href and normalize-space() = "hard-work"]]]')
  end

  def get_quotes
    @quotes_array = @html.collect {|node| node.text.strip}
    @single = @quotes_array.each do |quote|
      quote.gsub(/\s{2,}/, " ") 
    end
  end
end

I know that I can write a file like this:

File.open('text.txt', 'w') do |fo|
    fo.write(content)

but I don't know how to incorporate @single which holds the results of my scrape. Ultimate goal is to insert the information into a database.

I have come across some folks using Yaml but I am finding it hard to follow the step to step guide.

Can anyone point me in the right direction?

Thank you.

Upvotes: 0

Views: 369

Answers (1)

the Tin Man
the Tin Man

Reputation: 160581

Just use:

@single = @quotes_array.map do |quote|
  quote.squeeze(' ')
end

File.open('text.txt', 'w') do |fo|
  fo.puts @single
end

Or:

File.open('text.txt', 'w') do |fo|
  fo.puts @quotes_array.map{ |q| q.squeeze(' ') }
end

and don't bother creating @single.

Or:

File.open('text.txt', 'w') do |fo| 
  fo.puts @html.collect { |node| node.text.strip.squeeze(' ') }
end

and don't bother creating @single or @quotes_array.

squeeze is part of the String class. This is from the documentation:

"  now   is  the".squeeze(" ")         #=> " now is the"

Upvotes: 1

Related Questions