Amiga500
Amiga500

Reputation: 6131

Making query to indexed SOLR documents - using highlighting

I am having a dilemma with SOLR Tika and document indexing. As this was my first contact with the SOLR and Tika, I am still in learning phase. So far I got it to work, and also it returns proper highlighting for the results. It works as expected.

Something does not make sense to me. Every time I got results back, I receive the fields that I use (id, name and some more), I also receive highlights that behave properly but I also receive content field that I really do not need. Say I upload and index a file that has 600kb of text. I get content containing text as well, and that slows things. I use default Schema.xls contained in example folder for learning purposes.

As I was struggling to make this work, I made these changes to Schema.xml (added this xml)

 <copyField source="features" dest="text"/>
<fieldType name="features" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.ManagedStopFilterFactory" managed="english" />
    <filter class="solr.ManagedSynonymFilterFactory" managed="english" />
  </analyzer>
</fieldType>

And that functions.

I POST document to SOLR in similar fashion like shown in TIKA site:

curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F  
"[email protected]"

My way is dynamic as I determine literal.id based on the document name. But essentialy it is the same post.

I GET document from SOLR like this:

aws.instance:8983/solr/select?q=features:virus&hl.fragsize=50&hl=on&hl.fl=features&hl.maxAnalyzedChars=-1&hl.snippets=20&wt=json&indent=true

That returns JSON object including highlights. Trouble is I am getting content property as well, and I do not need that one.

I am getting ready to write my own Schema file and resolve this issue with wrong fields that I have used (features).

I made this work, but I know I did it the wrong way, thing is I cant see the wrong way

I know there must be another query to get highlights and also I know that features should not be used, content field is sufficient.

Upvotes: 0

Views: 196

Answers (1)

MatsLindh
MatsLindh

Reputation: 52862

You can decide which fields gets returned by supplying the fl parameter: &fl=id,name,etc.

Upvotes: 1

Related Questions