Saurabh
Saurabh

Reputation: 940

SOLR - configuring schema.xml for xml data

I am trying to index the wikitravels data using solr installed on my windows OS. Below is the sample input data:

<?xml version="1.0" encoding="UTF-8"?>

<add> 
  <page> 
    <title>3Days 2Night Chiang Mai to Chiang Rai</title>  
    <id>83509</id>  
    <revision> 
      <id>1305791</id>  
      <timestamp>2009-11-27T10:35:53Z</timestamp>  
      <contributor> 
        <username>Texugo</username>  
        <id>7666</id>  
        <realname/> 
      </contributor>  
      <comment>[[3Days 2Night Chiang Mai to Chiang Rai]] moved to [[Chiang Mai to Chiang Rai in 3 days]]</comment>  
      <text xml:space="preserve">#REDIRECT [[Chiang Mai to Chiang Rai in 3 days]]</text> 
    </revision> 
  </page> 
</add>

In my schema.xml, i have added the following changes:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
<field name="title" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="text" type="text_general" indexed="true" stored="true" multiValued="true"/>

<uniqueKey>id</uniqueKey>

Upon Posting, it doesn't show any error; however in SOLR web it doesnt show the data. Nor, i can see any error in the logs.

$ java -jar post.jar wiki.xml
SimplePostTool version 1.5
Posting files to base url http://localhost:8983/solr/update using content-type application/xml..
POSTing file wiki.xml
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/update..
Time spent: 0:00:00.342

Upvotes: 0

Views: 190

Answers (2)

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

As @notdang said, Solr input XML has a particular form. You can:

  1. Send data in the XML format Solr expects
  2. Use DataImportHandler which can parse XML
  3. Pre-process XML with XSLT on the way in to make it look like XML Solr expects.
  4. Use JSON and pre-process that

I suspect that option 2 (DataImportHandler) might be the easiest if you are using third party XML files. Also, DIH can import very large XML files as it reads them. Posting large files to Solr may hit a size limit.

Upvotes: 1

notdang
notdang

Reputation: 500

according to the documentation the xml should have this format:

<add>
  <doc>
    <field name="employeeId">05991</field>
    <field name="office">Bridgewater</field>
    <field name="skills">Perl</field>
    <field name="skills">Java</field>
  </doc>
  [<doc> ... </doc>[<doc> ... </doc>]]
</add>

So your xml should be like this

<?xml version="1.0" encoding="UTF-8"?>

<add> 
  <doc> 
    <field name="title">3Days 2Night Chiang Mai to Chiang Rai</field>  
    <field name="id">83509</field>  
    <field name="revision_id"> 1305791</field>
    <field name="revision_timestamp">2009-11-27T10:35:53Z</field>
    ....
  </doc> 
</add>

Upvotes: 0

Related Questions