Scraping Reddit using Nokogiri (429 too many requests)

Question

I'm trying to scrape Reddit with Nokogiri, but a single run of this keeps telling me that I'm putting in too many requests.

require 'nokogiri'
require 'open-uri'
url = "https://www.reddit.com/r/all"
redditscrape = Nokogiri::HTML(open(url))

OpenURI::HTTPError: 429 Too Many Requests

Isn't this only one request? If it's not, how do I create sleep intervals for Nokogiri?

sump · Accepted Answer

Reddit has an API

You could probably query the API for the particular sub-reddit(s) you want to scrape. Attempting to scrape all of Reddit just seems like a nightmare waiting to happen considering the high volume and the nested comments.

It looks like Reddit is blocking the ability to scrape in favor of using their public API.

Scraping Reddit using Nokogiri (429 too many requests)

Answers (2)

Related Questions