Reputation: 349
I am new to Solr indexing. I used Solr 5.5 and indexed a pdf file in it by simply using
#bin/post -c gettingstarted /home/ubuntu/pdf.pdf
I deleted the source pdf file. Is there anyway I can extract the pdf file from Apache Solr. I can see it is indexed from the URL
http://localhost:8983/solr/gettingstarted/select?q=*.pdf
Thanks in advance.
Upvotes: 0
Views: 990
Reputation: 1953
If it indexed properly by default pdf content is indexed into field name content
if it declared in schema correctly. so search some keyword (or *) using that content field.
Ex:
q=content:keyword
(keyword -> which is present in pdf)
http://localhost:8983/solr/gettingstarted/select?q=content:*
If contetnt
field is undefined. then add field definition in schema file.
Ex: Field name declaration
<field name="content" type="text_general" indexed="true" stored="true" multiValued="true"/>
Field Type defintion
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Upvotes: 1