Reputation: 720
In my previous question I got answer that I can store small index (few sites) data in solr without using any data base (Is it possible to store data in solr?). I wonder, if it is possible to store full html page source code in solr without using any data base?
Upvotes: 1
Views: 980
Reputation: 52799
Nutch with Solr is a solution if you want to Crawl websites and have it indexed.
Nutch with Solr Tutorial will get you started.
However, Nutch would not maintain the Original Solr code with html tags.
You would need to develop an custom solution by downloading the html page and then can use Solr Extracting Request Handler to feed Solr with the HTML file and extract contents from the html file. e.g. at link
Solr uses Apache Tika to extract contents from the uploaded html file
You can also check HTMLStripCharFilterFactory if you are feeding data as html text.
Upvotes: 4