Reputation: 21
I am faced with the problem of bots copying all the content off my webpage (which I try to update quite often).
I try to ban them, or obfuscate code to make it more difficult to copy. However, they find some way to overcome these limitations.
I'd like to try to limit the number of hits per minute (or X time, not neccesarily minutes), but use a Captcha to overcome those limits. Something like if you've requested more than 10 pages in the last 5 minutes, you need to prove you are human using a Captcha. So, if the user is a legitimate user, you'll be able to continue surfing the web.
I'd like to do it only in the content pages (to do it more efficiently). I had thought of MemCached, but since I don't owe the server, I can't use it. If I were using Servlets I'd use HashMap or similar, but since I use PHP, I am still trying to think of a solution.
I don't see MySql (or databases) as a solution, since I can have many hits per seconds. And I should be deleting after a few minutes old request, creating a lot of unnecesary and non-efficient traffic.
Any ideas?
A summary: If I get too many hits per minute in a section of the webpage, I'd like to limit it using Captcha efficiently, in PHP. Something like if you've requested more than 10 pages in the last 5 minutes, you need to prove you are human using a Captcha.
Upvotes: 2
Views: 1433
Reputation: 5439
Your questions kind of goes against the spirit of the internet.
I would guess the problem you're having is that these bots are stealing your traffic? If so, I'd suggest you try implementing an API allowing them to use your content legitimately.
This way you can control access, and crucially you can ask for a linkback to your site in return for using your content. This way your site should be number 1 for the content. You don't even really need an API to implement this policy.
If you insist on restricting user access you have the following choices:
The problem is - if you want your content to be found by Google AND restricted to other bots you're asking the impossible.
Your best option is create an API and control people copying your stuff rather than trying to prevent it.
Upvotes: 2