Reputation: 507
Will these 2 lines in robots.txt succesfully stop google from indexing pages with a url that looks something like: http://www.domain-name.com/product-tag/...
User-agent: *
Disallow: /product-tag/
Disallow: /product-tag/*
Since i'm having a problem of google indexing these pages and I cant find another way to stop that.
How much time does it takes for the changes in robots.txt to be seen in the search engines?
Upvotes: 1
Views: 423
Reputation: 45940
Robots.txt will stop Google from crawling your site. Not necessarily from indexing it. And particularly will not remove it if it has already indexed it.
You should instead add a meta noindex tag to the HEAD HTML on your page and then allow Google to recrawl the page to see this (i.e. do NOT block using robots.txt). After all pages have dropped out of Google (which can take some time), you can block it with robots.txt if you want.
The main reason and benefit for robots.txt is to stop Google wasting time looking at those pages. Each site is assigned a crawl budget so only a certain number of pages is reindexed by Google each day. So if it's wasting a lot of that crawl budget on pages that you don't want indexed, then it's not keeping the pages you do want indexed as up to date as it could do.
Additionally you should not depend on robots.txt to hide sensitive files since some crawlers (though not Google) may just ignore it.
Finally if your products are already indexed, and these are duplicate pages and that's why you don't want them indexed, then you can use a rel canonical link in your HEAD code to point to the real page instead of noindex.
Upvotes: 1