rubyist
rubyist

Reputation: 3132

Access variables across methods

I am trying to scrape an xml website and get contents from it.

class PageScraper
  def get_page_details         
    if xml_data
      #get the info  from xml website
    else
      #get it from html website
    end
  end
  def get_xml_details
    if xml_data
      #get it from xml website
    end
  end
  def xml_data
    xml_url = www.abcd.xml
    #Download and parse the xml data from abcd.xml site using nokogiri-gem
  end
end

Here, there are other methods which need to get the xml_data method. Every time, it needs to go and fetch and download data from xml website.

Is there any way to store the xml data in a variable (like @data = xml_data()) first time it is called and return the downloaded xml_data? In the next subsequent call to xml_data, it should be able to refer @data, which is cached.

Upvotes: 0

Views: 39

Answers (1)

the Tin Man
the Tin Man

Reputation: 160611

Why aren't you using OpenURI and Nokogiri? The normal process of retrieving and parsing the XML will do what you're wanting to do. The Nokogiri site is full of examples.

As far as your class goes, you probably need a method to retrieve the page, which will also store it in an instance or class variable, which is your choice depending on whether the class is responsible for multiple pages or only one.

As an example, here's some code for parsing HTML, which is almost identical to what would be done for parsing XML. The only real difference would be using Nokogiri::XML instead of Nokogiri::HTML:

require 'open-uri'
require 'nokogiri'

class PageScraper

  def initialize(url)
    @source = open(url).read
    @dom = Nokogiri::HTML(@source)
  end

  def errors?
    [email protected]?
  end

  def title
    @dom.title
  end

  def head
    @dom.at('head')
  end

  def body
    @dom.at('body')
  end

end

Of course you'd change the accessors for various elements like head and body to match your particular use-case.

After running that, both the HTML (or XML) and the parsed HTML/XML DOM would be available as instance variables, allowing you to easily refer to either. It's really not necessary to have @source since it can be recovered using @dom.to_xml or @dom.to_html, unless there are errors in the source, in which case Nokogiri will try to fix up the document possibly causing it to differ from the original.

It'd be used something like:

page_scraper = PageScraper('http://www.example.com')
abort "HTML errors found" if page_scraper.errors? 

page_title_text = page_scraper.title.text
page_scraper.title.text = 'Foo bar'
page_css = page_scraper.head.at('style').text

Upvotes: 1

Related Questions