Evgeniy
Evgeniy

Reputation: 2605

Why the files directive doesn't work in Apache's httpd.conf?

I had to noindex pdf files. I did it many times, so in this case, I used a files directive for adding noindex header with X-Robots-Tag, like Google recommends:

<Files ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

When I have used this before, it worked like a charm. But in this case, I realized no X-Robots-Tag on its own, neither its content (noindex, nofollow) in header. Mod_headers was enabled.

I tried

<FilesMatch ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

with no luck.

After many further tries and errors I've got it working with

<LocationMatch ~ "\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</LocationMatch>

But I don't really understand why the rule I used for years stopped working and the rule I blindly tried, suddenly works.

Could somebody explain it to me?

Upvotes: 3

Views: 1217

Answers (1)

RivenSkaye
RivenSkaye

Reputation: 871

The documentation for Apache states that FilesMatch takes a regular expression pattern <FilesMatch regexp> and is preferred over using <Files ~ "regexp">

The <FilesMatch> directive limits the scope of the enclosed directives by filename, just as the <Files> directive does. However, it accepts a regular expression.

In my experience with RegEx, this means using a wildcard to match all, rather than the normal <Files> directive which matches on a substring.

As for matching all named files in an expression, that means a small tweak is required to your existing code:

<FilesMatch ".+\.pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</FilesMatch>

If you expect to have a file named .pdf that you also need to exclude, replace + in that expression with *. This is due to how RegEx matches:

  • . Match any character, once.
  • + The previous modifier or block must occur one or more times
  • * The previous modifier or block may occur zero or more times

This means .+ matches all files with at least one character before .pdf in the filename, and .* matches all files ending on .pdf.

As for an explanation on why your Files directive doesn't work: The Files directive may be overridden by other Files directives appearing later in the same configuration or within a .htaccess file in the directory you're keeping the pdf files in. Furthermore, there's an order in which the directives are handled and they can all override previous steps: Directory < Files in Directory < .htaccess < Files in .htaccess < Location. So it's most probably a different part of the configuration that ignores the Files directive

Upvotes: 3

Related Questions