Christine M. Reaves
Christine M. Reaves

Reputation: 70

How can I get robots.txt to block access to URLs on site after "?" character but index page itself?

I have a small magento site which consists of page URLs such as:

http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/

However I also have pages which include filters (e.g. price and colour) and two such examples are:

http://www.example.com/products.html?price=1%2C1000
http://www.example.com/products/chairs.html?price=1%2C1000

The issue is that when Google bot and the other search engine bots search the site, it essentially grinds to a halt because they get stuck in all the "filter links".

So, in the robots.txt file how can it be configured e.g:

User-agent: *
Allow:
Disallow: 

To allow all pages like:

http://www.example.com/contact-us.html
http://www.example.com/customer/account/login/

to get indexed but in the case of http://www.example.com/products/chairs.html?price=1%2C1000 index products.html, but ignore all the content after the ?? The same should apply for http://www.example.com/products/chairs.html?price=1%2C1000

I also don't want to have to specify each page, in turn just a rule to ignore everything after the ? but not the main page itself.

Upvotes: 5

Views: 6660

Answers (4)

Liam McArthur
Liam McArthur

Reputation: 1033

Jim Mischel is correct. Using the wildcards that he's mentioned you can block out particular querystrings from being crawled - bearing in mind that only the major search engines support the use of wildcards in the robots.txt.

You can then test your rules before applying them using Google Webmaster Tools robot testing tool: https://www.google.com/webmasters/tools/robots-testing-tool.

Upvotes: 0

giasi
giasi

Reputation: 1

I'll help

Within Posts track gets my web html one without and one with html

Do you want a way to help close the html is this true

This is true block in robots

Disallow: .Html

Upvotes: -1

Jim Mischel
Jim Mischel

Reputation: 133995

I think this will do it:

User-Agent: *
Disallow: /*?

That will disallow any url that contains a question mark.

If you want to disallow just those that have ?price, you would write:

Disallow: /*?price

See related questions (list on the right) such as:

Restrict robot access for (specific) query string (parameter) values?

How to disallow search pages from robots.txt

Additional explanation:

The syntax Disallow: /*? says, "disallow any url that has a question mark in it." The / is the start of the path-and-query part of the url. So if your url is http://mysite.com/products/chairs.html?manufacturer=128&usage=165, the path-and-query part is /products/chairs.html?manufacturer=128&usage=165. The * says "match any character". So Disallow: /*? will match /<anything>?<more stuff> -- anything that has a question mark in it.

Upvotes: 8

Lee H
Lee H

Reputation: 294

You should be able to do:

Disallow: /?price=*

or even:

Disallow: /?*

Upvotes: -1

Related Questions