Bhaskar Choudhary
Bhaskar Choudhary

Reputation: 215

robots.txt allows & disallows few pages, what does it mean for other pages?

I was going through many websites' robots.txt files to check if I could scrape some specific pages. When I see following pattern -

User-agent: *
Allow: /some-page
Disallow: /some-other-page

There is nothing else on robots.txt file. Does it mean that all other remaining pages on the given website are available to be scraped?
P.S. - I tried googling this specific case but no luck.

Upvotes: 1

Views: 881

Answers (1)

Shoejep
Shoejep

Reputation: 4839

According to this website, Allow is used to a allow a directory when it's parent may be disallowed. I found this website quite useful as well.

Disallow: The command used to tell a user-agent not to crawl particular URL. Only one "Disallow:" line is allowed for each URL.

Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.

Regarding your question, if the remaining pages aren't included in a Disallow directory, you should be okay.

Upvotes: 1

Related Questions