Reputation: 215
I was going through many websites' robots.txt files to check if I could scrape some specific pages. When I see following pattern -
User-agent: *
Allow: /some-page
Disallow: /some-other-page
There is nothing else on robots.txt file. Does it mean that all other remaining pages on the given website are available to be scraped?
P.S. - I tried googling this specific case but no luck.
Upvotes: 1
Views: 881
Reputation: 4839
According to this website, Allow is used to a allow a directory when it's parent may be disallowed. I found this website quite useful as well.
Disallow: The command used to tell a user-agent not to crawl particular URL. Only one "Disallow:" line is allowed for each URL.
Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
Regarding your question, if the remaining pages aren't included in a Disallow directory, you should be okay.
Upvotes: 1