Reputation: 18859
I'm creating a robots.txt file for my website, but looking through my project structure, I'm not sure what to disallow.
Do I need to disallow standard .NET MVC directories and files like /App_Data, /web.config, /Controllers, /Models, /Global.asax? Or will those not be indexed already?
What about directories like /bin and /obj?
If I want to disallow a page, do I disallow /Views/MyPage/Index.cshtml, or /MyPage?
Also, when specifying the sitemap in the robots.txt file, can I use my Web.sitemap, or does it need to be a different xml file?
Upvotes: 2
Views: 2150
Reputation: 25200
'robots.txt' refers to paths as they are publically seen from Web crawlers.
There's nothing particularly special about a crawler: it merely uses HTTP to request pages from your site precisely like a user does.
So, given that your MVC site is properly configured, files like /web.config
or the paths you mention won't be visible to the outside world as neither IIS nor your application will be configured to serve them. Even if it was pointed to those files the spider would receive a 404 Not Found and continue.
Similarly, your .cshtml
or .aspx
content files won't be seen with those extensions. Rather, a Web crawler will see precisely what you'll show to users.
Upvotes: 4