James
James

Reputation: 3184

Solr - Tika - Parsing Content to Enable Highlighting

My understanding is that indexing a PDF, Word, Excel, etc. document through Solr will allow searching but not highlighting. I have this code to perform the indexing:

        String urlString = "http://localhost:8983/solr"; 
        SolrServer solr = new HttpSolrServer(urlString);
        ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract");

        for (MultipartFile file : files) {
            if (file.getOriginalFilename().equals("")) {
                continue;
            }
            File destFile = new File(destPath, file.getOriginalFilename());
            file.transferTo(destFile);
            up.addFile(destFile);

            up.setParam("literal.id", destFile.getAbsolutePath());
            up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

            try {
                solr.request(up);

            } catch (SolrServerException sse) {
                sse.printStackTrace();
            }

        }

    }
    } catch (IOException ioe) {
      ioe.printStackTrace();   
    }

I have read that in order to enable highlighting I will need to "store/parse the content?" How can this be done? Thanks for your help.

Upvotes: 0

Views: 946

Answers (1)

Paige Cook
Paige Cook

Reputation: 22555

You will need to modify the Schema file for your Solr instance and set stored="true"for the content field. I am assuming that you are using the default field settings for the ExtractingRequestHandler want to return highlight results against that field.

Please reference the Field Options By Use Case for a matrix and notes on what field options must be enabled for Highlighting and other features to work correctly.

Upvotes: 2

Related Questions