Is it possible to store in solr full html page source code?

Question

In my previous question I got answer that I can store small index (few sites) data in solr without using any data base (Is it possible to store data in solr?). I wonder, if it is possible to store full html page source code in solr without using any data base?

Jayendra · Accepted Answer

Nutch with Solr is a solution if you want to Crawl websites and have it indexed.
Nutch with Solr Tutorial will get you started.
However, Nutch would not maintain the Original Solr code with html tags.

You would need to develop an custom solution by downloading the html page and then can use Solr Extracting Request Handler to feed Solr with the HTML file and extract contents from the html file. e.g. at link

Solr uses Apache Tika to extract contents from the uploaded html file

You can also check HTMLStripCharFilterFactory if you are feeding data as html text.

Is it possible to store in solr full html page source code?

Answers (1)

Related Questions