rhanly
rhanly

Reputation: 91

Uploading Data to Solr with Data import handler

Hi I am struggling to upload my data onto solr with the data import handler. What I am doing is starting a solr server using the command in the server folder

solr start

This then allows me to open up a localhost on my computer where a core that I have previously set up is displayed.

Screenshot of solr running on computer

I have then edited the files solrconfig.xml and schema.xml

In solrconfig.xml I have put the following lines of code in

<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-dataimporthandler-.*.jar" /> 

<schemaFactory class="ClassicIndexSchemaFactory"/>

and

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler" startup="lazy">
<lst name="defaults">
    <str name="config">data-config.xml</str>
</lst>

In schema.xml (renamed from the managed-schema file) I added

<field name="_version_" type="plong" indexed="true" stored="true"/>
<field name="id" type="string" indexed="true" stored="true" required="true"/>
<field name="title" type="string" indexed="true" stored="true"/>
<field name="revision" type="pint" indexed="true" stored="false"/>
<field name="user" type="string" indexed="true" stored="false"/>
<field name="userId" type="pint" indexed="true" stored="false"/>
<field name="text" type="text_en" indexed="true" stored="false"/>
<uniqueKey>id</uniqueKey>

Then I created a data-config.xml file with the following code

<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8"/>
<document>
    <entity name="page"
            processor="XPathEntityProcessor"
            stream="true"
            forEach="/mediawiki/page"
            url="/Volumes/BACKUP/enwiki-latest-pages-articles.xml"
            transformer="RegexTransformer,DateFormatTransformer"
            >
        <field column="id" xpath="/mediawiki/page/id" />
        <field column="title" xpath="/mediawiki/page/title" />
        <field column="revision" xpath="/mediawiki/page/revision/id" />
        <field column="user" xpath="/mediawiki/page/revision/contributor/username" />
        <field column="userId" xpath="/mediawiki/page/revision/contributor/id" />
        <field column="text" xpath="/mediawiki/page/revision/text" />
        <field column="timestamp" xpath="/mediawiki/page/revision/timestamp" dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss'Z'" />
        <field column="$skipDoc" regex="^#REDIRECT .*" replaceWith="true" souceColName="text"/>
    </entity>
</document>

Here the xml I wish to index is stored on an external harddrive on my computer. All seems to be working well until I type the following into my browser

http://localhost:8983/solr/wiki/dataimport?command=full-import

and the following is shown

full-import logging

Does anyone know how to fix this? I'm using solr 7.7 and all the questions on Stackoverflow seem to be for earlier versions. The tutorial I am trying to follow is https://www.youtube.com/watch?v=2VkFQTqrRYo&t=310s which is old so I think that's why I'm getting this error.

Upvotes: 2

Views: 5016

Answers (2)

rhanly
rhanly

Reputation: 91

Turns out all i needed to do was to change in solrconfig.xml :

<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:false}"

(false instead of true)

Upvotes: 1

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

The error says it all...The clasNotFoundException...Check your class path , looks like DataImportHandler is not on your classpath...

<lib dir="../../../dist/" regex="apache-solr-dataimporthandler-.*\.jar" />

After the configuration changes restart the jetty server.

Upvotes: 0

Related Questions