Reputation: 31
I have implemented solr 6.5.1 today in my debian server but I have trouble getting the pdf text content. The searching is ok, because the document appears ok in when I query for example my name: "juan". However, the does not appear with each str result how it supposed to do.
This is the example query:
And this is the result:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
<lst name="params">
<str name="hl.snippets">20</str>
<str name="q">juan</str>
<str name="hl">true</str>
<str name="fl">title</str>
<str name="hl.usePhraseHighlighter">true</str>
<str name="hl.fl">content</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<arr name="title">
<str>CV_Juan_Jara_ultimo</str>
</arr>
</doc>
</result>
<lst name="highlighting">
<lst name="/solr-6.5.1/mydocs/CV_Juan_Jara_ultimo.pdf"/>
</lst>
</response>
Additionally, the log is showing all the pdf text, so I assume it was correctly indexed (I indexed the pdf using the command: bin/post -c ex mydocs/CV_Juan_Jara_ultimo.pdf).
I added the "content" field to the schema, using curl:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"add-field" : {
"name":"text",
"type":"text_general",
"indexed":"true",
"stored":"false",
"multiValued":"true"
}
}' localhost:8983/solr/ex/schema
Do you know what could be wrong ?
All that I want to do is search a topic in my pdf and then get all results highlighted like this:
Upvotes: 2
Views: 1054
Reputation: 31
SOLVED: the solution that worked for me finally, was to replace the _text_
field in schema with this curl command:
curl -X POST -H 'Content-type:application/json' --data-binary '{
"replace-field" : {
"name":"_text_",
"type":"text_general",
"indexed":"true",
"stored":"true",
"multiValued":"true"
}
}' http://localhost:8983/solr/ex/schema
This is because the _text_
field comes with "stored":"false" by default.
NOTE: Remember to indexing all files again to your core if you did it prior to this schema field replace
Upvotes: 1
Reputation: 1114
It is a very common and simple mistake :
"stored":"false" should be "stored":"true" for the 'content' field.
Currently all the highlighters require the field to be stored to be used [1] .
[1] https://cwiki.apache.org/confluence/display/solr/Highlighting
Upvotes: 1