SOLR - Missing configuration: Unsupported ContentType: text/html; Unsupported ContentType: application/pdf

Question

I have SOLR installed and running on Windows. I am following the Quick Start tutorial from the SOLR website. Using the post.jar file I tried to index the documents listed under /solr/docs and I got the following erros -

ERROR - 2016-05-11 16:35:16.772; [c:gettingstarted s:shard2 r:core_node1 x:gettingstarted_shard2_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Invalid UTF-8 middle byte 0xe3 (at char #10, byte #-1)

I tried to index one file at a time, starting with a pdf and then tried a html. Below are the commands I used and the exceptions I see

java -Dc=gettingstarted -Dtype=application/pdf -jar example/exampledocs/post.jar scandocs/

ERROR - 2016-05-16 16:17:55.992; [c:gettingstarted s:shard2 r:core_node1 x:gettingstarted_shard2_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unsupported ContentType: application/pdf  Not in: [application/xml, application/csv, application/json, text/json, text/csv, text/xml, application/javabin]

java -Dc=gettingstarted -Dtype=text/html -jar example/exampledocs/post.jar scandocs/

ERROR - 2016-05-16 16:19:03.601; [c:gettingstarted s:shard2 r:core_node1 x:gettingstarted_shard2_replica1] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Unsupported ContentType: text/html  Not in: [application/xml, application/csv, application/json, text/json, text/csv, text/xml, application/javabin]

All I have under the /scandocs fodler is a html file. It seems as if like my SOLR instance is not configured to read html/pdf documents. But the tutorial talks about indexing a bunch of rich documents without mentioning anything about the configuration.

I would really appreciate if anyone could help me with the configuration I need here.

SOLR - Missing configuration: Unsupported ContentType: text/html; Unsupported ContentType: application/pdf

Answers (1)

Related Questions