Jose Gomez
Jose Gomez

Reputation: 43

How do I keep search engines from searching a pdf?

I have built a site in WordPress and set up user logins for members only. So anyone with that username and password and view the page once they logged in. Others will just simply have to continue viewing the site. So I got that working and all but the client has entered information like PDFs for users to view on that page and this client was searching on the web the names that a PDF contains and that it should not be accessible to the public but only for those who are logged in. Is there any way I can set that PDF in private that is not searched by search engines. And if i can set it up where not just no one with the link can view it, only those who are logged in.

Upvotes: 0

Views: 1895

Answers (3)

David
David

Reputation: 1

in the robots.txt file added

User Agent:
Disallow: /
.pdf$

After a while, depending on the speed the search engines update-index your site. Visit https://pdflookup.com to enter your PDF Title to check. Your PDF will not show up in search results.

Upvotes: 0

sadaf2605
sadaf2605

Reputation: 7540

Solution 1 : Password protection

Protecting site with HTTP Basic Authentication is the best way to block anyone else accessing the site. But that is not possible all the time when you have demo audience test.

Solution 2 : Robots.txt

Another Solution Google is providing is to use Robots.txt file to tell Bots not to crawl or list pages in results. But that’s not always a solution. Google’s Matt Cuts has confirmed that Google may include pages from such sites if Google think is relevant.

User-agent: *
Disallow: /

add your filename to disallow

Solution 3 : Using .htaccess RewriteCond

So the solution is to block Google and other similar bots from accessing your site. For that, put following code in your htaccess.

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} AltaVista [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [OR]
RewriteCond %{HTTP_USER_AGENT} Slurp
RewriteRule ^.*$ "http\:\/\/htmlremix\.com" [R=301,L]

Change URL in last line to your main site so that your site gets SEO ranking if someone linked in to your blocked site.

Solution 4: Request Google to remove

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=164734&from=61062&rd=1

Solution 5: Few other tools you may like to go thru

http://www.debianhelp.co.uk/htaccessweb.htm

Upvotes: 2

nizz
nizz

Reputation: 1133

Use the robots.txt file in order to tell the crawler not to look into your pdf files Something like this:

User Agent: *
Disallow: /*.pdf$

Look here

Upvotes: 1

Related Questions