Duplicates terms on solr index

Question

I have a doubt that I cannot answer to my self even when I was trying hard.

I think is a matter of comprehension.

So...

Im trying to index a long text field (a product description), which can have duplicates words. Lets say we are talking about a flavour and we say chocolate, then continues speaking and then again chocolate.
When solr is indexing, (as far as I understand the analysis tab in the solr control panel), it will create a term (which are "pointers", each term -> associated to a uniqueKey atribute which identify the "item")for each token we have.

Does the solr index gonna have two terms pointing to the same item ?

This is my text analyzer:

I though deletes duplicates entries, but when I have a look to the analysis found this:

screenshot

As far as I undestand solr, at the end, in my index there is gonna be this three terms pointing to that "item": chocolate, blablabla and chocolate. Is that right ?

I hope the question is clear :)

Thanks !

Aujasvi Chitkara · Accepted Answer

What you see after Analysis, is just before when text is indexed onto Solr. When you actually index it, it stores each term just once, and saves all occurrences of that term in form of (document_id, position).

Hope example below makes it more clear.

Suppose you want to add following three documents onto Solr:

T[0] = "dark chocolate is the best chocolate"

T[1] = "i love dark chocolate"

T[2] = "chocolate is delicious"

Solr will store in inverted index as follows:

"best": {(T[0], position)}

"chocolate": {(T[0], position1), (T[0], position2), (T[1], position), (T[2], position)}

"dark": {(T[0], position), (T[1], position)}

"delicious": {(T[2], position)}

"i": {(T[1], position)}

"is": {(T[0], position), (T[1], position)}

"love": {(T[0], position)}

"the": {(T[0], position)}

Note:

position stores the start offset and end offset of term in the document
chocolate term is stored once in index, but has two references to document T[0]

Duplicates terms on solr index

Answers (1)

Related Questions