PublicDisplayName
PublicDisplayName

Reputation: 317

How to 404 urls in .htaccess?

I'm trying to make all urls for https://www.example.com/page/ redirect to 404.

The problem I'm having is every number after /page/ is redirecting to the homepage, and is getting indexed - causing a duplicate content penalty.

So far I have around 30 pages indexed, which all redirect to the homepage.

I'd like to have any URL which has a number after /page/ to 404 (with the 404 header), so I can deindex all of these pages.

So far I've tried:

Redirect 404 ^page/(.*)$ or Redirect 404 /page/*

Unsurprisingly these haven't worked - where am I going wrong?

Upvotes: 1

Views: 356

Answers (1)

MrWhite
MrWhite

Reputation: 45829

Redirect 404 ^page/(.*)$ or Redirect 404 /page/*

You are close, except that the Redirect directive uses simple prefix-matching, it does not use a regular expression (regex) or wildcard match.

to make all urls for https://www.example.com/page/ redirect to 404.

To make all requests for /page/ result in a 404, regardless of whether a number follows it or not, you would use the following:

Redirect 404 /page/

As noted above, the Redirect directive uses simple prefix-matching, so the above serves a 404 for any URL that starts with /page/.

However ....

any URL which has a number after /page/ to 404

To specifically serve a 404 for URLs of the form /page/<number> only and not /page/ then you would need to use a RedirectMatch directive instead that uses regex to match and not prefix-matching.

For example:

RedirectMatch 404 ^/page/\d+$

\d+$ matches 1 or more digits to the end of the URL-path. So the above will match /page/1 and /page/123456 but not /page/ or /page/abc or /page/123z etc.

If instead you wanted to match /page/<something> where <something> is literally anything, but not match /page/ only. Then you could instead use the following:

RedirectMatch 404 ^/page/.

The above matches /page/ at the start of the URL-path followed by at least 1 other character.

Note that whilst we are using the Redirect and RedirectMatch directives here, there is no external redirect (3xx response). The 404 is served by Apache as an internal subrequest. Apache sends the 404 Not Found header.

However, if you specifically want to "deindex" these pages quicker then consider sending a "410 Gone" instead. This is a stronger signal for search engines that the page is not coming back. In this case you can also use the gone keyword in place of the 410 status code.

Upvotes: 3

Related Questions