Reputation: 139
In the following robots.txt file, it says to disallow all directories for magpie-crawler. Let's say I was using a different web crawler like Scrapy. This robots.txt doesn't list anything else, so would a scrapy bot be allowed to scrape?
User-agent: magpie-crawler
Disallow: /
Sitemap: https://www.digitaltrends.com/sitemap_index.xml
Sitemap: https://www.digitaltrends.com/news.sitemap.google.xml
Sitemap: https://www.digitaltrends.com/image-sitemap-index.xml
Upvotes: 2
Views: 165
Reputation: 142
You can parse the data using Scrapy. Just describe in the header that you are WEB-BROWSER in Scrapy settings:
'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36'
Upvotes: -1
Reputation: 2439
According to the official website, this does mean that only that single bot is disallowed. You may, if desired, use Scrapy.
If they wanted, they could have only allowed one bot:
User-agent: Google
Disallow:
User-agent: *
Disallow: /
Upvotes: 2