Daniel Congrove
Daniel Congrove

Reputation: 3669

How to prevent a development staging website, hosted on Azure, from being indexed by search engines

Specific to Web Apps hosted on Microsoft Azure, is there a way to prevent the mydomain.azurewebsites.net URL from being indexed by search engines? I'm planning to use a web app as a staging website, and don't want it to accidentally get indexed.

I know I could add a robots.txt file to the project with everything set to no-index, but I don't want to ever accidentally publish it to the production site (or alternatively, forget to publish it to the staging website).

Is there a setting in Azure that will prevent the ".azurewebsites.net" domain from being indexed? Or if the robots.txt file is the only way, how do you keep it organized so that the right robots.txt file is published to staging and production, using ASP.NET Core.

Upvotes: 5

Views: 2802

Answers (3)

juunas
juunas

Reputation: 58743

Another option is to enable Authentication against your Azure Active Directory from the Authentication/Authorization tab in your App Service's settings for development and staging environments.

This way users will be forced to login to access those apps.

Documentation: https://learn.microsoft.com/en-us/azure/app-service/app-service-authentication-overview

https://learn.microsoft.com/en-us/azure/app-service/app-service-mobile-how-to-configure-active-directory-authentication

Upvotes: 2

Dusty
Dusty

Reputation: 3971

Restrict access based on hostname and request IP

Unless you need your staging slot to be accessible to a wide range of dynamic IPs, you could consider using the URL Rewrite module and adding rule[s] to your web app config to disallow traffic except for a few known IPs, but make those rules conditional on the HOST header matching the staging host (mydomain.azurewebsites.net), so they can never apply on the production hostname.

The details in the question here show a similar type of setup.

Upvotes: 2

Rob Reagan
Rob Reagan

Reputation: 7686

You can publish robots.txt to your staging server once. This can be done via FTP or via your SCM site. Once you publish this file, web publish will not remove additional files on the server (including your robots.txt file) unless you select "Remove additional files at destination" in your web publish settings.

So the robots.txt file will hang around forever on your staging server unless you remove it. Then you do not need to include robots.txt in your project or solution, and not risk accidentally publishing it to your production environment.

Upvotes: 2

Related Questions