David Geismar
David Geismar

Reputation: 3422

Scraping successive pages until the last page using Nokogiri and Mechanize

I am trying to scrape multiple pages from a website. I want to scrape a page, then click on next, get that page, and repeat until I hit the end. I wrote this so far:

page = agent.submit(form, form.buttons.first)
#submitting a form
while lien = page.link_with(:text=>'Next')
  # while I have a next link on page, keep scraping
  html_body = Nokogiri::HTML(body)
  links = html_body.css('.list').xpath("//table/tbody/tr/td[2]/a[1]")
  links.each do |link|
    purelink = link['href']
    puts purelink[/codeClub=([^&]*)/].gsub('codeClub=', '')
    lien.click
  end
end

Unfortunately, with this script I keep on scraping the same page in an infinite loop... How can I achieve what I want to do ?

Upvotes: 1

Views: 882

Answers (2)

pguardiario
pguardiario

Reputation: 55012

It should look more like this:

page = form.submit form.button
scrape page

while link = page.link_with :text => 'Next'
  page = link.click
  scrape page
end

Also you don't need to parse the page body with nokogiri, mechanize already does that for you.

Upvotes: 0

Kimball
Kimball

Reputation: 1281

I would try this, replace lien.click with page = lien.click.

Upvotes: 1

Related Questions