Blake Geist
Blake Geist

Reputation: 285

Rails, Scraping from dynamic URL

At the most basic I am wanting to scrape a website and render parts of code like all the H1s or something. I have used Nokogiri and Mechanize in the past and am familiar with the basics of scraping. In the past I would structure a thor task, like this

class Scrape < Thor
desc "cl_redding","Scrape Craigslist for Rentals"
def cl_redding

    require File.expand_path('config/environment.rb')

    require 'rubygems'

    require 'nokogiri'

    require 'open-uri'

    require 'mechanize'

    require 'yaml'

    require 'aws-sdk'

    require 'csv'

    require 'json'

    agent = Mechanize.new

    page = agent.get('http://redding.craigslist.org/search/apa?zoomToPosting=&catAbb=apa&query=&minAsk=&maxAsk=&bedrooms=&housing_type=&hasPic=1&excats=')

All cool and it works, though It only scrapes craigslist and because I specifically called through the page =, what I am asking is, Does anyone have any advice on how I would scrape a site called from an input box on a website? specific help, tutorials, advice or resources welcome.

Upvotes: 0

Views: 160

Answers (1)

Rafal
Rafal

Reputation: 2576

I think your question is a bit too generic.

  • You need to start a rails app
  • Build a form to accept an input of the url to scrape - possibly implement a Page model which will store the pages to scrape
  • Parse the url the way you do it in your example
  • Possibly use a back end processing tool like sidekiq to avoid scraping on the front end
  • Store the results and display them on Page#show

Upvotes: 1

Related Questions