Rakesh
Rakesh

Reputation: 4334

Configure GSA to crawl content

My website is say www.abc.com and there is a specific url pattern which contains both secure and non secure content. For example www.abc.com/foo/xxx serves secured/open content depending on the content.

How can I tell GSA to use secure crawl for the secure content? I know this is simple is the specific url always seved secured content. I have read google's support site here, but how will GSA know that some urls are secured content? I can't list all the urls in GSA admin console because there are more than 10K of such unique urls.

Upvotes: 0

Views: 442

Answers (2)

Michael Cizmar
Michael Cizmar

Reputation: 462

The answer to your question (and not your problem) is:

The GSA will determine if the content is secure or not based on the http response by the web server. If your content responds with a 401 or 301/302 then the GSA will assume this content is secure.

Public content is determine if the content responds with 200.

Upvotes: 0

Mohan kumar
Mohan kumar

Reputation: 458

I understood that some urls in your website are secured and rest are public urls. And you wanted to tell GSA to use Controlled-Access Content crawl only for the secured urls. If this is the case, then you have to move all the secured content to some common pattern Ex: www.abc.com/secured/xxx and crawl that pattern using Controlled-Access Content crawling. If that is not a feasible solution, then add some meta tag to the web pages (For open pages- add viewers= public, For Secured pages- add viewers=secured) and crawl your whole site using Controlled-Access Content crawl and make all urls public using GSA configuration. For serving the results, do the authentication in your application and query GSA with appropriate value in viewers requiredfields parameter.

Upvotes: 1

Related Questions