roccia
roccia

Reputation: 179

How should I use recursive method in ruby

I wrote a simple web scrawler using Mechanize, now I'm stuck at how to get next page recursively, below is the code.

def self.generate_page  #generate a Mechainze page object,the first page
    agent = Mechanize.new
    url = "http://www.baidu.com/s?wd=intitle:#{URI.encode(WORD)}%20site:sina.com.cn&rn=50&gpc=stf#{URI.encode(TIME)}"
     page = agent.get(url)
     page  
end

def self.next_page(n_page)  #get next page recursively by click   next tag showed in each pages
 puts n_page   
# if I dont use puts , I get nothing , when using puts, I get 
#<Mechanize::Page:0x007fd341c70fd0>
#<Mechanize::Page:0x007fd342f2ce08>
#<Mechanize::Page:0x007fd341d0cf70>
#<Mechanize::Page:0x007fd3424ff5c0>
#<Mechanize::Page:0x007fd341e1f660>
#<Mechanize::Page:0x007fd3425ec618>
#<Mechanize::Page:0x007fd3433f3e28>
#<Mechanize::Page:0x007fd3433a2410>
#<Mechanize::Page:0x007fd342446ca0>
#<Mechanize::Page:0x007fd343462490>
#<Mechanize::Page:0x007fd341c2fe18>
#<Mechanize::Page:0x007fd342d18040>
#<Mechanize::Page:0x007fd3432c76a8>  
#which are the results I want

    np = Mechanize.new.click(n_page.link_with(:text=>/next/)) unless n_page.link_with(:text=>/next/).nil?
     result = next_page(np) unless np.nil?
     result    # here the value is empty, I dont know what is worng
end

def  self.get_page  # trying to pass the result of next_page() method 
    puts  next_page(generate_page)
    # it seems result is never passed here, 
end

I followed these two links What is recursion and how does it work? and Ruby recursive function but still cant figure out what's wrong.. hope someone can help me out.. Thanks

Upvotes: 0

Views: 157

Answers (1)

max pleaner
max pleaner

Reputation: 26758

There are a few issues with your code:

  1. You shouldn't be calling Mechanize.new more than once.
  2. From a stylistic perspective, you are doing too many nil checks.

Unless you have a preference for recursion, it'll probably be easier to do it iteratively.

To have your next_page method return an array containing every link page in the chain, you could write this:

# you should store the mechanize agent as a global variable
Agent = Mechanize.new

# a helper method to DRY up the code
def click_to_next_page(page)
  Agent.click(n_page.link_with(:text=>/next/))
end

# repeatedly visits next page until none exists
# returns all seen pages as an array
def get_all_next_pages(n_page)
   results = []
   np = click_to_next_page(n_page)
   results.push(np)
   until !np
     np = click_to_next_page(np)
     np && results.push(np)
   end
   results
end

# testing it out (i'm not actually running this)
base_url = "http://www.baidu.com/s?wd=intitle:#{URI.encode(WORD)}%20site:sina.com.cn&rn=50&gpc=stf#{URI.encode(TIME)}"
root_page = Agent.get(base_url)
next_pages = get_all_next_pages(root_page)
puts next_pages

Upvotes: 2

Related Questions