Reputation: 827
I'm using this guide to protect a folder via .htaccess
and a PHP script.
We use a Google Search Appliance to index this particular protect folder. However, I'm not sure how to allow the crawler through.
To test, I used a firefox addon to fake my user_agent (to msnbot
in this case) and used the script echo $_SERVER['HTTP_USER_AGENT']
, verifying that msnbot/1.1 (+http://search.msn.com/msnbot.htm
was in fact my determined UA.
This is the string of conditionals that authentication script checks against. All of these conditions work, except the last.
current_user_can('edit_posts') || mm_member_decision( array ( "isMember"=>"true", "hasBundle"=>"1", "status" => "active" ) ) || auth_redirect() || ($_SERVER['HTTP_USER_AGENT'] == 'msnbot/1.1 (+http://search.msn.com/msnbot.htm)');
Upvotes: 0
Views: 241
Reputation: 827
Figured it out. || auth_redirect()
should be last in the conditional.
Upvotes: 1
Reputation: 41796
The Google Search Appliance user agent is named gsa-crawler
.
A full user-agent string might look like this:
gsa-crawler (Enterprise; GID09999; [email protected])
https://developers.google.com/search-appliance/documentation/614/help_gsa/crawl_headers
Try to allow that user-agent for a successful crawl.
And because you already figured out, that the user-agent alone is not enough, please add a check for the id
and/or the email
.
Upvotes: 1