user3278588
user3278588

Reputation: 101

Google console block URLs robots.txt redirection 301

In my website, I have some urls with the following shape : https://www.MyShop.com/648-category-name?n=50&amp%253Bn=10&id_category=81

Such urls are created when a visitor changes the default quantity products per page from 20 (default value) to 50.

There is no need to index such urls. In addition such urls could be regarded as duplicate content.

1- In robots.txt, I put the following directive :

2- In Google console Urls setting, I added the following parameter :

3- However having done this, in Google console, I am getting a message saying that the url is blocked (on Smartphone, not on computer). It seems that the Googlebot-mobile crawler :

4- To solve the matter, I wonder whether it is possible **to make a 301 redirection

Would anyone know what line should be added in the htaccess file to make such a redirection ?

I thank anyone in advance for any help in this matter.

Patrick

Upvotes: 1

Views: 602

Answers (1)

AfroThundr
AfroThundr

Reputation: 1225

You would probably want this to target the bots only, probably by matching the user agent:

RewriteCond %{HTTP_USER_AGENT} (googlebot|google-mobile) [NC]

If you want to strip all URLs containing a query string to the bare URL, you can use the below:

RewriteCond %{QUERY_STRING} .
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]

If you only wanted to redirect for a specific query string component, such as n=foo:

RewriteCond %{QUERY_STRING} (^|&)n=(.+)(&|$)
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]

If you're using an apache version older than 2.4 that doesn't support the QSD flag, simply append a ? to %{REQUEST_URI} instead.


Edit 1:

That's very odd. The query string in this URI:

https://www.MyShop.com/648-category-name?%252525252525253Bn=10

Contains a semicolon ; which was percent-encoded into %3B but then the percent sign % was encoded again into %25 over and over.

Without addressing how to fix that particular issue, you could modify the regex I stated above to match the percent-encoding as well with:

RewriteCond %{QUERY_STRING} (^|&)([%A-Za-z0-9]+)n=(.+)(&|$)

Or a simpler, if somewhat less targeted:

RewriteCond %{QUERY_STRING} (^|&)(.+)n=(.+)(&|$)

But that would also match any query string component that happened to end with n=, so this URI:

https://www.MyShop.com/648-category-name?somethingn=foo&id_category=42

Would be captured as well.

Since you're only targeting bots, it might be best to just strip the query strings completely. If this is only an issue on specific parts of the site, you can also narrow down the locations on the site it would apply to by putting those rewrite rules in a location block:

<location /648-category-name>
    RewriteCond %{HTTP_USER_AGENT} (googlebot|google-mobile) [NC]
    RewriteCond %{QUERY_STRING} . # Or any of the other regexes
    RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
</location>

Alternatives to this, which may or may not be feasible for you, would be adding a rel="canonical" meta tag, as explained in this answer, or adding Disallow: /*?* in your robots.txt to stop all crawling of pages with query strings, as explained in this answer.


Edit 2:

There are more efficient ways to write those rules.

Multiple conditions, separated by the apache [OR] flag:

RewriteCond %{QUERY_STRING} (^|&)n=10(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)n=20(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)n=50(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)amp%(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)%25252525(.+)(&|$) 
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]

As a single condition, with the regex | operator:

RewriteCond %{QUERY_STRING} (^|&)n=(10|20|50|amp%|%25)(.+)(&|$)
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]

This might matter for performance reasons on high traffic sites.

Upvotes: 0

Related Questions