Thanks for all the fish
Thanks for all the fish

Reputation: 1701

How to block visits from Ruby Mechanize Gem?

I'm starting to use Mechanize gem for Ruby and I wonder if there is anyway a web server can detect and block activities from Mechanize agent?

If yes, what's the code or steps to block Mechanize to scrap or visit a site?

Upvotes: 3

Views: 1092

Answers (2)

the Tin Man
the Tin Man

Reputation: 160571

There are a number of ways they can detect an automated process is hitting their site:

  • They can check the user-agent string.
  • They can see what you are requesting. A browser requests all the images and CSS in a HTML page. Mechanize will not by default.
  • A human pauses to read a page and understand what it says. A piece of code doesn't unless it's been programmed to pause it will run at full speed so requests follow one after another quickly.

These don't necessarily point to Mechanize running, but are fingerprints of code scraping a site.

What can they do about it?

  • Ban that user-agent.
  • Ban any requests from your IP number or domain or subnet.
  • Ban any requests from your IP number, domain or subnet that occur too quickly.

There are many different ways to go about those things, depending on their server and networking hardware.

This question is pretty off-topic for StackOverflow and probably should be asked on https://serverfault.com/ or https://webmasters.stackexchange.com/

Upvotes: 2

Thilo
Thilo

Reputation: 262694

You can put up a robots.txt file and hope people respect it.

If you start blocking by User-Agent string, they can just pretend to be IE.

Upvotes: 0

Related Questions