Reputation: 2605
I had to noindex
pdf files. I did it many times, so in this case, I used a files
directive for adding noindex
header with X-Robots-Tag
, like Google recommends:
<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</Files>
When I have used this before, it worked like a charm. But in this case, I realized no X-Robots-Tag
on its own, neither its content (noindex
, nofollow
) in header. Mod_headers was enabled.
I tried
<FilesMatch ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
with no luck.
After many further tries and errors I've got it working with
<LocationMatch ~ "\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</LocationMatch>
But I don't really understand why the rule I used for years stopped working and the rule I blindly tried, suddenly works.
Could somebody explain it to me?
Upvotes: 3
Views: 1217
Reputation: 871
The documentation for Apache states that FilesMatch takes a regular expression pattern <FilesMatch regexp>
and is preferred over using <Files ~ "regexp">
The <FilesMatch> directive limits the scope of the enclosed directives by filename, just as the <Files> directive does. However, it accepts a regular expression.
In my experience with RegEx, this means using a wildcard to match all, rather than the normal <Files> directive which matches on a substring.
As for matching all named files in an expression, that means a small tweak is required to your existing code:
<FilesMatch ".+\.pdf$">
Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>
If you expect to have a file named .pdf
that you also need to exclude, replace +
in that expression with *
. This is due to how RegEx matches:
.
Match any character, once.+
The previous modifier or block must occur one or more times*
The previous modifier or block may occur zero or more timesThis means .+
matches all files with at least one character before .pdf
in the filename, and .*
matches all files ending on .pdf
.
As for an explanation on why your Files
directive doesn't work:
The Files directive may be overridden by other Files directives appearing later in the same configuration or within a .htaccess
file in the directory you're keeping the pdf files in. Furthermore, there's an order in which the directives are handled and they can all override previous steps:
Directory
< Files
in Directory
< .htaccess
< Files
in .htaccess
< Location
. So it's most probably a different part of the configuration that ignores the Files
directive
Upvotes: 3