willemIP
willemIP

Reputation: 21

Lucene Scoring: TermQuery w & w/o TermVectors

Does TermQuery:ExtractTerms result in a higher count when termvectors/positions/offsets are turned on? (assuming that there is more than 1 occurence of a match). Conversely, with the inverted file info turned off, does ExtractTerms always return 1 and only 1 term?

EDIT: How and where does turning on termvectors in the schema affect scoring?

Upvotes: 1

Views: 631

Answers (1)

Xodarap
Xodarap

Reputation: 11849

TermQuery.ExtractTerms extracts the terms in the query, not the result. So a search for "foo:bar" will always return exactly one term, regardless of what's in the index.

It sounds to me like you want to know about highlighting, not Query.ExtractTerms.

EDIT: Based on your comment, it sounds like you are asking: "how is scoring affected by term vectors?" The answer to that is: not at all. The term frequency, norm, etc. is calculated at index time, so it doesn't matter what you store.

The major exception is PhraseQuery with slop, which uses the term positions. A minor exception is that custom scoring classes can use whatever data they want, so not only term vectors but also payloads etc. can potentially affect the score.

If you're just doing TermQuerys though, what you store should have no effect.

Upvotes: 1

Related Questions