RoryB
RoryB

Reputation: 1047

Solr 3.5 only searching part of an indexed file

So, I am indexing some large files (~30,000 lines) using solr 3.5. The contents of each file is indexed as the field filecontents. Searching for a file by name shows that this index field contains the full contents of the file.

However, if I query for a term in this field, for example using filecontents:fred , I only get a hit if the term appears in the first 2000 or so lines of each file. So, for example, I get a hit if the term "fred" is on line 200, but not if it is only on line 4000 of the file.

Any idea why the rest of the filecontents index is not being correctly searched, or how I might investigate this further? I've attached the relevant parts of my schema.xml file below. Interestingly, we don't see the same problem using solr 4.3.

       <fieldType name="default" class="solr.TextField">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.ClassicFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.StopFilterFactory" />
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory" />
            <filter class="solr.ClassicFilterFactory" />
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.StopFilterFactory" />
        </analyzer>
    </fieldType> 

<field name="filecontents" type="default" indexed="true" stored="true" multiValued="true" omitNorms="false" termVectors="false"/>

Upvotes: 0

Views: 91

Answers (1)

d whelan
d whelan

Reputation: 804

change <maxFieldLength> in solrconfig.xml to a larger number. <maxFieldLength> is in <mainIndex> and <indexDefaults>

Upvotes: 1

Related Questions