Jackson Henley
Jackson Henley

Reputation: 1531

Experience with web crawlers on heroku

Does anybody have experience coding web crawlers with gems such as anemone and deploying them to heroku for your own person use? Would such a continuously running programs violate any of heroku's TOA/TOS?

Upvotes: 4

Views: 2313

Answers (2)

berezovskyi
berezovskyi

Reputation: 3491

Not any more.

Heroku Acceptable Use Policy states in Prohibited Actions p.21 that crawler must

  • identify itself via a unique User Agent
  • obey robots.txt (including crawl-delay directive)
  • from p.20 stems the requirement not use you crawler as an "open proxy"

NB! A free instance must not exceed 18 hours of work a day.

Upvotes: 2

Ashitaka
Ashitaka

Reputation: 19203

I don't have any experience with using web crawlers in Heroku (I would actually be interested in reading about that!). But here are my points:

  1. This is its prohibited content. Illegal activity is prohibited (duh) and since some sites "prohibit" web crawlers and screen scrapers (such as IMDb), that could be considered illegal. But let's ignore this for now.

  2. These are its prohibited actions. The following is prohibited:

    data mining any web property (including Heroku) to find email addresses or other user account information;

  3. These are its usage limits:

    • Network Bandwidth: 2TB/month - Soft
    • Shared DB processing: Max 200msec per second CPU time - Soft
    • Dyno RAM usage: 512MB - Hard
    • Slug Size: 200MB - Hard
    • Request Length: 30 seconds - Hard
  4. In its TOS, point 2.5., it's explained:

    Repeated exceeding of the hard or soft usage limits may lead to termination of your account.

Emphasis is mine. Heroku gives each app 750 dyno hours. As long as you don't abuse Heroku's services and don't use it to gather personal info, I believe you're in the clear. I suggest:

  1. Somehow cap your web crawler. Just as you should limit the rate of API requests, you should have the common courtesy of limiting the speed of your crawler.

  2. Keep an eye on your dyno hours. You can do so here.

Upvotes: 1

Related Questions