Reputation: 70
I have a small magento site which consists of page URLs such as:
http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/
However I also have pages which include filters (e.g. price and colour) and two such examples are:
http://www.example.com/products.html?price=1%2C1000
http://www.example.com/products/chairs.html?price=1%2C1000
The issue is that when Google bot and the other search engine bots search the site, it essentially grinds to a halt because they get stuck in all the "filter links".
So, in the robots.txt
file how can it be configured e.g:
User-agent: *
Allow:
Disallow:
To allow all pages like:
http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/
to get indexed but in the case of http://www.example.com/products/chairs.html?price=1%2C1000
index products.html
, but ignore all the content after the ?
?
The same should apply for http://www.example.com/products/chairs.html?price=1%2C1000
I also don't want to have to specify each page, in turn just a rule to ignore everything after the ?
but not the main page itself.
Upvotes: 5
Views: 6660
Reputation: 1033
Jim Mischel is correct. Using the wildcards that he's mentioned you can block out particular querystrings from being crawled - bearing in mind that only the major search engines support the use of wildcards in the robots.txt.
You can then test your rules before applying them using Google Webmaster Tools robot testing tool: https://www.google.com/webmasters/tools/robots-testing-tool.
Upvotes: 0
Reputation: 1
I'll help
Within Posts track gets my web html one without and one with html
Do you want a way to help close the html is this true
This is true block in robots
Disallow: .Html
Upvotes: -1
Reputation: 133995
I think this will do it:
User-Agent: *
Disallow: /*?
That will disallow any url that contains a question mark.
If you want to disallow just those that have ?price
, you would write:
Disallow: /*?price
See related questions (list on the right) such as:
Restrict robot access for (specific) query string (parameter) values?
How to disallow search pages from robots.txt
Additional explanation:
The syntax Disallow: /*?
says, "disallow any url that has a question mark in it." The /
is the start of the path-and-query part of the url. So if your url is http://mysite.com/products/chairs.html?manufacturer=128&usage=165
, the path-and-query part is /products/chairs.html?manufacturer=128&usage=165
. The *
says "match any character". So Disallow: /*?
will match /<anything>?<more stuff>
-- anything that has a question mark in it.
Upvotes: 8
Reputation: 294
You should be able to do:
Disallow: /?price=*
or even:
Disallow: /?*
Upvotes: -1