Srinivas Kallepalli
Srinivas Kallepalli

Reputation: 63

Does solr collection size will be increased if we have lot of fields with indexed=true and stored=false

I have to create lot of multifields which will be indexed="true" but stored="false".

ex:

<field name="_text_edge_ngram" type="text_edge_ngram" indexed="true" stored="false" multiValued="true" />

I have lot of multifields like above, I know it will not store them in collection, but it will create different tokens based on the type I give (ngram, edge ngram and others).

So creating tokens will increase the size of the collection?

Upvotes: 1

Views: 190

Answers (2)

Abhijit Bashetti
Abhijit Bashetti

Reputation: 8658

Yes, when you define the field with indexed=true, then it will occupy the space and the index size will increase.

The more the fields have indexed=true, more space occupied.

One more point in the field type applied to the field.

If you are applying the non tokenized field type then the index size is less like string field type.

But if you are applying the the tokenized field type like ngram then it will create too many tokens and hence the index size will be more.

for example, let us consider the below for your field.

<analyzer>
  <tokenizer class="solr.NGramTokenizerFactory" minGramSize="4" maxGramSize="5"/>
</analyzer>

Input text : "bicycle"

Tokens created : "bicy", "bicyc", "icyc", "icycl", "cycl", "cycle", "ycle"

Here you can see for a single word 7 tokens are created. It can vary depending on requirement and what is applied as min and max gram size.

This is another reason which helps to increase the index size.

You have to be very skillful while selecting the field type for your field.

Upvotes: 3

Ga&#235;l J
Ga&#235;l J

Reputation: 15090

Short answer: yes, size will increase.

Adding a field, even if not stored means that there will be a new index for this field and for each indexed value the list of documents matching (this is a simplification of how Solr stores data).

How much of an increase it will represent depends a lot of your data. It could be significant or not..

Upvotes: 1

Related Questions