Namo
Namo

Reputation: 1

How can i resolve MAX_ARRAY_LENGTH error when indexing data in Solr?

I am referring the Cloudera Search Indexing documents given below -

https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_data_index_prepare.html https://www.cloudera.com/documentation/enterprise/5-9-x/topics/search_batch_index_use_mapreduce.html

I have prepared collections and a schema file and a morphline file as per my dataset which is in csv format.

id  jobtitle    jobdescription  city    state   classification  salary
1 Senior Android Developer  complex problem solving New Hope    PA  it  94036
2 Mobile Solutions Developer complex problem solving    Glen Allen  VA  it 60726

The MRIT command I am using is :

sudo -u hdfs hadoop \
--config /etc/hadoop/conf.cloudera.yarn \
jar /opt/cloudera/parcels/CDH/lib/solr/contrib/mr/search-mr-*-job.jar 
org.apache.solr.hadoop.MapReduceIndexerTool \
-D 'mapred.child.java.opts=-Xmx500m' \
--log4j /opt/cloudera/parcels/CDH/share/doc/search-
1.0.0+cdh5.8.3+0/examples/solr-nrt/log4j.properties \
--morphline-file $HOME/jobs.conf \
--output-dir NN:8020/user/$USER/outdir \
--zk-host localhost/solr  --collection jobs\
--go-live \
NN:8020/user/$USER/indir

Below is my schema file -

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="example" version="1.5">
<fields>

    <!-- Posts -->
    <field name="id" type="string" indexed="true" stored="true" 
    required="true"/>
    <field name="jobtitle" type="text_general" indexed="true" 
    stored="true"/>
    <field name="jobdescription" type="text_general" indexed="true" 
    stored="true" termVectors="true"/>
    <field name="classification" type="splitOnPeriod" indexed="true" 
    stored="true"/>
    <field name="city" type="text_general" indexed="true" stored="true"/>
    <field name="state" type="text_general" indexed="true" stored="true"/>
    <field name="salary" type="int" indexed="true" stored="true"/>
    <field name="_version_" type="long" indexed="true" stored="true"/> 
    <field name="content" type="text_general" indexed="true" stored="true" 
    multiValued="true"/> 
    <field name="text" type="text_general" indexed="false" stored="true" 
    multiValued="true"/>   

    <copyField source="jobtitle" dest="content" />   
    <copyField source="jobdescription" dest="content" />  
</fields>

<types>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true" />
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" 
    positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"                 
    positionIncrementGap="0"/>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" 
    positionIncrementGap="0"/>

    <fieldType name="text_general" class="solr.TextField" 
    positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="splitOnPeriod" class="solr.TextField" 
    positionIncrementGap="100">
         <analyzer>
            <tokenizer class="solr.PatternTokenizerFactory" pattern="\." />
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>        
</types>

<uniqueKey>id</uniqueKey>

</schema>

I did a dry-run and it worked but with go-live i always get MAX_ARRAY_LENGTH error.

1554 [main] INFO  org.apache.solr.hadoop.MapReduceIndexerTool  - Indexing 1 
files using 1 real mappers into 2 reducers
Error: MAX_ARRAY_LENGTH

The error appears to be in the Mapping phase.

1686 [main] ERROR org.apache.hadoop.mapred.YarnChild  - Error running child 
: java.lang.NoSuchFieldError: MAX_ARRAY_LENGTH
    at org.apache.lucene.codecs.memory.DirectDocValuesFormat.<clinit>
 (DirectDocValuesFormat.java:58)

Please help me out with this problem.

Upvotes: 0

Views: 122

Answers (1)

Namo
Namo

Reputation: 1

This error generally happens when something is not properly installed in your environment and max_array_length property is set to 1 in org/apache/lucene/util/ArrayUtil.class. Either you can upgrade your CDH to get rid of the error or you can increase the heap size of the variable in that java class to 2 or greater. I tried the same MRIT command in a different environment and it worked fine.

References - https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/util/ArrayUtil.html

Upvotes: 0

Related Questions