Reputation: 115
Upvotes: 0
Views: 1057
Reputation: 871
Ahhh... in that case, you'll want to make your User-Agent something less obvious and standard. That will trick some websites. For instance, Firefox uses: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0
The smarter ones or those that use Cloudflare will recognize you to be a fake. So, you'll want to use something like UIPath or Selenium to mimic a human. There really is no other way to trick the big boys like Amazon. You can use their API to get the data you are after most likely, but it has limits.
*NB: I left my previous answer simply because I'm sure people will google this and want to know how to restrict certain bots. This answers the OPs question more directly.
Upvotes: 1
Reputation: 31
There are lot of tech to prevent crawling from sites. If you want to crawl that site, you should make your crawler like a person.
1) set sleep time randomly between requests
2) set random user-agent at each request
3) you can do crawl by making proxies farm
there are also different things to crawl that site by analyzing http traffic.
Upvotes: 0
Reputation: 871
There are two simple ways of significantly reducing the number of bots crawling your site:
Put a robots.txt file in your root directory. This provides instructions to bots. This will block many bots, but not the ones masquerading as a real user (which is why Cloudflare is so great because it does block almost all of the bad bots. However, robots.txt is usually sufficient. For instance, is you wanted to block all bots from a specific directory you would use:
User-agent: *
Disallow: /
This will block all bots, including legitimate bots like Google. You usually don't want to do this, except in your site's admin directory or a few other directories.
The following would block Googlebot completely:
User-agent: Googlebot
Disallow: /
Given the previous example, you need to analyze your Google Analytics data and look for suspicious User-agents and replace Googlebot from above with the name of the agents.
Upvotes: 0