Reputation: 1552
Edit:
This seemed like a pretty good question and now I see that there might be a way more straight froward answer than what I just spent the entire day coding up.
I found reference that you can put this in the .htaccess file and the search engine will not index the pdf files. Way too simple. I guess this is not well known or someone would have saved me the 7 hours I put into coding up something near the one answer I got.
<FilesMatch "\.pdf$">
header set x-robots-tag: noindex
</FilesMatch>
I have a web site that gives pdf files away and is ad supported. Lately Google has been linking to the pdf instead of the web page that has the embedded and the ads. Seems crazy as they are Google ads but nice for the user I guess. My revenue has dropped in half. I can make the web results go to another page by creating a directory called .pdf and putting a file index.php with the appropriate header re-directs. In the new file I can rename the pdf but this just delays the problem to when they index the new location. Doing this to 700 files is not something I want to do every week.
I have considered translating the pdf files to html and have tested zamzar windershare, somepdf, intrapdf and none of them do a good job. Some of the translations were almost readable, some just a white page, some a black page, one was a black page with some blotches here and there. I tried an online service a few hours ago and have yet to get the email with my file.
I am not set on pdf to html, it is just what I could think of.
Perhaps there is a better solution. Others must have this problem and have solved it somehow. Obviously I need the page searchable as well so just converting everything to images isn't a solution. I don't know what to do.
Upvotes: 0
Views: 294
Reputation: 14113
You have to choose: either Google will read the PDFs and index them - hence they will come up in search results independently, or you can exclude PDFs in robots.txt and Google will not read the PDFs and not index them at all.
You can't ask Google to index the PDFs but apply this result to the parent only. To do what you want you need to stop linking to PDFs.
If you use a flash based PDF viewer or something instead of actually linking to PDFs in an iframe then this might solve your problem.
Upvotes: 1
Reputation: 89
Does your SEO rely on content inside the pdf? If not, you could create a robot.txt disallowing search engine access to the folder containing the pdf files.
Upvotes: 0