Reputation: 101
In my website, I have some urls with the following shape : https://www.MyShop.com/648-category-name?n=50&%253Bn=10&id_category=81
Such urls are created when a visitor changes the default quantity products per page from 20 (default value) to 50.
There is no need to index such urls. In addition such urls could be regarded as duplicate content.
1- In robots.txt, I put the following directive :
2- In Google console Urls setting, I added the following parameter :
Does this parameter change page content seen by the user ?
I set : Yes: Changes, reorders, or narrows page content
How does this parameter affect page content :
I set : Other
*Which URLs with this parameter should Googlebot crawl :
I set : No URLs
3- However having done this, in Google console, I am getting a message saying that the url is blocked (on Smartphone, not on computer). It seems that the Googlebot-mobile crawler :
4- To solve the matter, I wonder whether it is possible **to make a 301 redirection
Would anyone know what line should be added in the htaccess file to make such a redirection ?
I thank anyone in advance for any help in this matter.
Patrick
Upvotes: 1
Views: 602
Reputation: 1225
You would probably want this to target the bots only, probably by matching the user agent:
RewriteCond %{HTTP_USER_AGENT} (googlebot|google-mobile) [NC]
If you want to strip all URLs containing a query string to the bare URL, you can use the below:
RewriteCond %{QUERY_STRING} .
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
If you only wanted to redirect for a specific query string component, such as n=foo
:
RewriteCond %{QUERY_STRING} (^|&)n=(.+)(&|$)
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
If you're using an apache version older than 2.4 that doesn't support the QSD
flag, simply append a ?
to %{REQUEST_URI}
instead.
Edit 1:
That's very odd. The query string in this URI:
https://www.MyShop.com/648-category-name?%252525252525253Bn=10
Contains a semicolon ;
which was percent-encoded into %3B
but then the percent sign %
was encoded again into %25
over and over.
Without addressing how to fix that particular issue, you could modify the regex I stated above to match the percent-encoding as well with:
RewriteCond %{QUERY_STRING} (^|&)([%A-Za-z0-9]+)n=(.+)(&|$)
Or a simpler, if somewhat less targeted:
RewriteCond %{QUERY_STRING} (^|&)(.+)n=(.+)(&|$)
But that would also match any query string component that happened to end with n=
, so this URI:
https://www.MyShop.com/648-category-name?somethingn=foo&id_category=42
Would be captured as well.
Since you're only targeting bots, it might be best to just strip the query strings completely. If this is only an issue on specific parts of the site, you can also narrow down the locations on the site it would apply to by putting those rewrite rules in a location
block:
<location /648-category-name>
RewriteCond %{HTTP_USER_AGENT} (googlebot|google-mobile) [NC]
RewriteCond %{QUERY_STRING} . # Or any of the other regexes
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
</location>
Alternatives to this, which may or may not be feasible for you, would be adding a rel="canonical"
meta tag, as explained in this answer, or adding Disallow: /*?*
in your robots.txt to stop all crawling of pages with query strings, as explained in this answer.
Edit 2:
There are more efficient ways to write those rules.
Multiple conditions, separated by the apache [OR]
flag:
RewriteCond %{QUERY_STRING} (^|&)n=10(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)n=20(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)n=50(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)amp%(.+)(&|$) [OR]
RewriteCond %{QUERY_STRING} (^|&)%25252525(.+)(&|$)
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
As a single condition, with the regex |
operator:
RewriteCond %{QUERY_STRING} (^|&)n=(10|20|50|amp%|%25)(.+)(&|$)
RewriteRule ^ %{REQUEST_URI} [L,R=301,QSD]
This might matter for performance reasons on high traffic sites.
Upvotes: 0