Reputation: 171
I am using Apache Solr 6.6.0 in order to build a search engine by recursively indexing all files in a folder.
How I do it is as follows: 1) I create an index based on the cloud example. 2) I index all files that are in the given folder.
bin\solr start -e cloud -noprompt
java -Dc=gettingstarted -Dauto=yes -Ddata=files -Drecursive=yes -jar example\exampledocs\post.jar <path_to_folder>
Later when I search for a query in the user interface, I see that, even though it provides me top matches, it does not provide me the document content. After some research, I found a field named "_text_" and its configuration in the managed-schema file:
<field name="_text_" type="text_general" multiValued="true" indexed="true" stored="false"/>
As you see, the field is not stored, which I think it is the reason why the response does not provide the content.
Am I on the right track ? If so, how can I edit the configuration of this field ? Should I delete it and create a new one with the same name and with stored=true ?
Thank you.
Upvotes: 0
Views: 95
Reputation: 2764
The _text_
field is not supposed to be stored because it is used as a "catch all" field. So first, you should check the Solr configuration in order to make sure that it contains only the file content. If it is so, then you could mark that field as stored.
But, generally speaking, files contents are only indexed, not stored, because
So, in other words: use Solr for search and once you get a given item metadata, use its identifier for going in some other system and "view" the corresponding content. This is the usual* scenario, especially for dealing with unstructured data like txt files
Upvotes: 1