Reputation: 64363
I want to restrict the crawler access to my rails app running on Heroku. This would have been a straight forward task if I was using Apache OR nginX. Since the app is deployed on Heroku I am not sure how I can restrict access at the HTTP server level.
I have tried to use robots.txt file, but the offending crawlers don't honor robot.txt.
These are the solutions I am considering:
1) A before_filter
in the rails layer to restrict access.
2) Rack based solution to restrict access
I am wondering if there are any better ways to deal with this problem.
Upvotes: 4
Views: 2148
Reputation: 4176
I have read about honeypot solutions: You have one URI that must not be crawled (put it in robots.txt). If any IP calls this URI, block it. I'd implement it as a Rack middleware so the hit does not go to the full Rails stack.
Sorry, I googled around but could not find the original article.
Upvotes: 9