yura
yura

Reputation: 14645

Difference in scoring between multivalued field and tokenized field

For example I have several tags per document. I can

Both approaches will work. The question is how different will be scoring for those types of indexing? (i.e. field normalization factor, tf/idf count, field length calucaltion, slope factor etc)

Upvotes: 2

Views: 708

Answers (2)

milan
milan

Reputation: 12402

Lucene will concatenate all the values for a multivalued filed behind the scene anyway, so it'd not be much different than your first case, if at all. If you use tags only as filters (give me all docs tagged with tag2), then you definitely won't see any difference.

Upvotes: 1

d whelan
d whelan

Reputation: 804

I would think the multi-value would be more accurate.

imagine a tokenized string "spider web developer"

vs

multi-value field with the values "spider" and "web developer"

a search for "web developer" would match both fields but the match vs the multi-value field could be seen as more accurate.

Upvotes: 0

Related Questions