Reputation: 380
I would use the hosting for live testing, but I want to protect access and prevent search engine indexing.
For example (server directory structure) within public_html:
_private
_bin
_cnf
_log
_ ... (more default directories hosting)
testpublic
css
images
index.html
I want index.html is visibile to everyone and all other directories (except "testpublic") are hidden, protected access and search engines not to index.
The directory "testpublic" I wish it was public but may not be indexed in search engines, not sure if this is possible.
To do understand that I need 2 files .htaccess.
One general in "public_html" and other specific for "testpublic".
The .htaccess general (public_html) I think it should be something like:
AuthUserFile /home/folder../.htpasswd
AuthName "test!"
AuthType Basic
require user admin123
< FilesMatch "index.html">
Satisfy Any
< / FilesMatch>
Can anyone help me create the files with the appropriate properties? Thank you!
Upvotes: 0
Views: 118
Reputation: 717
You can use a robots.txt file in your root folder. All standards-abiding robots will obey this file and not index your files and folders.
Example Robots.txt that tells all (*) crawlers to move on and index nothing.
User-agent: *
Disallow: /
You could use .htaccess files to fine tune what your server (assuming Apache) serves out and what directory indexes are visible. In which case you would add
IndexIgnore *
To your .htaccess file to disallow indexes.
Updated (Credit to https://stackoverflow.com/users/1714715/samuel-cook):
If you want to specifically stop a bot/crawler and know its USER AGENT string you can do so in your .htaccess
<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Googlebot
RewriteRule ^.* - [F,L]
</IfModule>
Hope this helps.
Upvotes: 1