rockstardev
rockstardev

Reputation: 13527

How to discourage scraping on a Drupal website?

I have a Drupal website that has a ton of data on it. However, people can quite easily scrape the site, due to the fact that Drupal class and IDs are pretty consistent.

  1. Is there any way to "scramble" the code to make it harder to use something like PHP Simple HTML Dom Parser to scrape the site?
  2. Are there other techniques that could make scraping the site a little harder?
  3. Am I fighting a lost cause?

I am not sure if "scraping" is the official term, but I am referring to the process by which people write a script that "crawls" a website and parses sections of it in order to extract data and store it in their own database.

Upvotes: 0

Views: 1166

Answers (2)

Igor Savinkin
Igor Savinkin

Reputation: 6267

  1. First I'd recommend you to google over web scraping anti-scrape. There you'll find some tools for fighting web scrapig.
  2. As for the Drupal there should be some anti-scrape plugins avail (google over).
  3. You might be interesting my categorized layout of anti-scrape techniques answer. It's for techy as well as non-tech users.

Upvotes: 2

Bustikiller
Bustikiller

Reputation: 2498

I am not sure but I think that it is quite easy to crawl a website where all contents are public, no matter if the IDs are sequential or not. You should take into account that if a human can read your Drupal site, a script also does.

Depending on your site's nature if you don't want your content to be indexed by others, you should consider setting registered-user access. Otherwise, I think you are fighting a lost cause.

Upvotes: 1

Related Questions