Reputation: 91
Hi I am struggling to upload my data onto solr with the data import handler. What I am doing is starting a solr server using the command in the server folder
solr start
This then allows me to open up a localhost on my computer where a core that I have previously set up is displayed.
I have then edited the files solrconfig.xml and schema.xml
In solrconfig.xml I have put the following lines of code in
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*.jar" />
<schemaFactory class="ClassicIndexSchemaFactory"/>
and
<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler" startup="lazy">
<lst name="defaults">
<str name="config">data-config.xml</str>
</lst>
In schema.xml (renamed from the managed-schema file) I added
<field name="_version_" type="plong" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="string" indexed="true" stored="true"/>
<field name="revision" type="pint" indexed="true" stored="false"/>
<field name="user" type="string" indexed="true" stored="false"/>
<field name="userId" type="pint" indexed="true" stored="false"/>
<field name="text" type="text_en" indexed="true" stored="false"/>
<uniqueKey>id</uniqueKey>
Then I created a data-config.xml file with the following code
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8"/>
<document>
<entity name="page"
processor="XPathEntityProcessor"
stream="true"
forEach="/mediawiki/page"
url="/Volumes/BACKUP/enwiki-latest-pages-articles.xml"
transformer="RegexTransformer,DateFormatTransformer"
>
<field column="id" xpath="/mediawiki/page/id" />
<field column="title" xpath="/mediawiki/page/title" />
<field column="revision" xpath="/mediawiki/page/revision/id" />
<field column="user" xpath="/mediawiki/page/revision/contributor/username" />
<field column="userId" xpath="/mediawiki/page/revision/contributor/id" />
<field column="text" xpath="/mediawiki/page/revision/text" />
<field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
<field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" souceColName="text"/>
</entity>
</document>
Here the xml I wish to index is stored on an external harddrive on my computer. All seems to be working well until I type the following into my browser
http://localhost:8983/solr/wiki/dataimport?command=full-import
and the following is shown
Does anyone know how to fix this? I'm using solr 7.7 and all the questions on Stackoverflow seem to be for earlier versions. The tutorial I am trying to follow is https://www.youtube.com/watch?v=2VkFQTqrRYo&t=310s which is old so I think that's why I'm getting this error.
Upvotes: 2
Views: 5016
Reputation: 91
Turns out all i needed to do was to change in solrconfig.xml :
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"
(false instead of true)
Upvotes: 1
Reputation: 8658
The error says it all...The clasNotFoundException...Check your class path , looks like DataImportHandler is not on your classpath...
<lib dir="../../../dist/" regex="apache-solr-dataimporthandler-.*\.jar" />
After the configuration changes restart the jetty server.
Upvotes: 0