Reputation: 6131
I am having a dilemma with SOLR Tika and document indexing. As this was my first contact with the SOLR and Tika, I am still in learning phase. So far I got it to work, and also it returns proper highlighting for the results. It works as expected.
Something does not make sense to me. Every time I got results back, I receive the fields that I use (id, name and some more), I also receive highlights that behave properly but I also receive content field that I really do not need. Say I upload and index a file that has 600kb of text. I get content containing text as well, and that slows things. I use default Schema.xls contained in example folder for learning purposes.
As I was struggling to make this work, I made these changes to Schema.xml (added this xml)
<copyField source="features" dest="text"/>
<fieldType name="features" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ManagedStopFilterFactory" managed="english" />
<filter class="solr.ManagedSynonymFilterFactory" managed="english" />
</analyzer>
</fieldType>
And that functions.
I POST document to SOLR in similar fashion like shown in TIKA site:
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F
"[email protected]"
My way is dynamic as I determine literal.id based on the document name. But essentialy it is the same post.
I GET document from SOLR like this:
aws.instance:8983/solr/select?q=features:virus&hl.fragsize=50&hl=on&hl.fl=features&hl.maxAnalyzedChars=-1&hl.snippets=20&wt=json&indent=true
That returns JSON object including highlights. Trouble is I am getting content property as well, and I do not need that one.
I am getting ready to write my own Schema file and resolve this issue with wrong fields that I have used (features).
I made this work, but I know I did it the wrong way, thing is I cant see the wrong way
I know there must be another query to get highlights and also I know that features should not be used, content field is sufficient.
Upvotes: 0
Views: 196
Reputation: 52862
You can decide which fields gets returned by supplying the fl
parameter: &fl=id,name,etc
.
Upvotes: 1