user6680
user6680

Reputation: 139

If a robots.txt file only lists a single user-agent, are other bots allowed to crawl?

In the following robots.txt file, it says to disallow all directories for magpie-crawler. Let's say I was using a different web crawler like Scrapy. This robots.txt doesn't list anything else, so would a scrapy bot be allowed to scrape?

User-agent: magpie-crawler
Disallow: /


Sitemap: https://www.digitaltrends.com/sitemap_index.xml
Sitemap: https://www.digitaltrends.com/news.sitemap.google.xml
Sitemap: https://www.digitaltrends.com/image-sitemap-index.xml

Upvotes: 2

Views: 165

Answers (2)

You can parse the data using Scrapy. Just describe in the header that you are WEB-BROWSER in Scrapy settings:

'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'

Upvotes: -1

Shane Fontaine
Shane Fontaine

Reputation: 2439

According to the official website, this does mean that only that single bot is disallowed. You may, if desired, use Scrapy.

If they wanted, they could have only allowed one bot:

User-agent: Google
Disallow: 

User-agent: * 
Disallow: /

Upvotes: 2

Related Questions