Alex M
Alex M

Reputation: 511

Solr loading information without data import handler

I have 700.000 street names, 8111 municipality names, and 80333 locality postcodes. I would like to index all this information in solr. The user wants to search this information through an ajax autocomplete form. I have proved it with few data and the behavoir of the ajax autocomplete form it's ok.

 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
  </analyzer>
</fieldType>

The problem happens when loading all the data into solr

(It's okay to have a different document for each one (700.000 + 8111 + 80.333 documents) ??)

thanks for your time

Upvotes: 1

Views: 923

Answers (2)

Michael Dillon
Michael Dillon

Reputation: 32392

Seriously, write a shell script and use curl to send the updates to SOLR.

You are trying to shoot cans off the wall with a cannon mounted on a ship floating in your swimming pool. You don't need a cannon or a ship or a pool. Just stand there with an air gun and pop the updates off one by one until done.

For an examlple shell script complete with sample SOLR updates, download the SOLR binary, either apache-solr-3.5.0.tgz or apache-solr-3.5.0.zip from a mirror near you. Find the mirror at http://lucene.apache.org/solr/downloads.html

Unpack the archive, go into the example directory and follow these instructions http://lucene.apache.org/solr/tutorial.html

If you are on UNIX, just use post.sh.

By the way, check the SOLR version that you have installed on your server. If it isn't 3.50 then why are you using an old version when you have the newer one right here, right now?

Upvotes: 1

beerbajay
beerbajay

Reputation: 20270

I assume your municipalities, street names, and post codes are supposed to be autocompleted separately. In this case you'd use a separate solr core for each one.

Or should I use data input handler to load it faster??

DIH will be pretty fast, and as long as this information doesn't change very often, it should be fine to do it this way.

Can I concat string values from diferent columns of different tables with data input handler??

Yes; in data-config.xml you give specific SQL query and can use the database's native concatenation (e.g. || in oracle).

Upvotes: 1

Related Questions