Talk2MeGooseman
Talk2MeGooseman

Reputation: 143

How to scrape pages originating from the same domain on Heroku

I created a Facebook-style URL scraper for posting content.

When someone inputs a URL it will send a request and, in the backend, I use Nokogiri to scrape the URL to pull information to construct the post.

It works fine for all other websites like apple.com, sony.com, but when I use a link from my origin domain ("mywebsite.com") it times out, no error is displayed besides Heroku timing out the request after 30 secs. If I scrape my domain from my localhost on my computer it works.

Is there some kind of origin rule preventing Nokogiri from scraping pages origination from the same domain?

I'm using Ruby On Rails 3.1.10, Nokogiri 1.4.7 and Heroku Cedar Stack.

Upvotes: 0

Views: 277

Answers (1)

mind.blank
mind.blank

Reputation: 4880

Is the scraping run in a background job or via a web worker? Do you have only 1 dyno? If your app has only 1 web worker then it might be busy trying to scrape and therefore can't serve the page.

Try scaling your dynos to 2 and see if the problem persists.

heroku ps:scale web=2

Upvotes: 1

Related Questions