MarioCannistra
MarioCannistra

Reputation: 275

how to switch off / on indexing in a web page

I'm using Nutch 1.6 and Solr 4.3 on Ubuntu Server 12.04 I would like to switch on and off content indexing. Is there a way to specify this behaviour in my HTML pages so that Solr can behave accordingly ?

As an example, when using Google Search Appliance I would use "googleon" - "googleoff" tags around the content on the page that i don't want indexed (headers, footers, copyright strings, etc ).

thank you

Upvotes: 0

Views: 248

Answers (2)

Paige Cook
Paige Cook

Reputation: 22555

You wil need to create a custom plugin for Nutch to be able to accomplish this behavior. Below are some relevant links with examples.

Upvotes: 3

alfeliz
alfeliz

Reputation: 1

There is a text file, "robots.txt" that provide information to the search engines about which html pages the program is allowed or not to look for content. In the link FAQ robots.txt: How to stop indexing you will find all the information.

Upvotes: 0

Related Questions