Shashir Reddy
Shashir Reddy

Reputation: 51

Inserting a TermVector into Lucene

Learning how to use Lucene!

I have an index in Lucene which is configured to store term vectors.

I also have a set of documents I have already constructed custom term vectors for (for an unrelated purpose) not using Lucene.

Is there a way to insert them directly into the Lucene inverted index in lieu of the original contents of the documents?

I imagine one way to do this would be to generate bogus text using the term vector with the appropriate number of term occurrences and then to feed the bogus text as the contents of the document. This seems silly because ultimate Lucene will have to convert the bogus text back into a term vector in order to index.

Upvotes: 0

Views: 416

Answers (1)

Doug T.
Doug T.

Reputation: 65649

I'm not entirely sure what you want to do with these term vectors ultimately (score? just retrieve?) but here's one strategy I might advocate for.

Instead of focusing on faking out the text attribute of term vectors, consider looking into payloads which attach arbitrary metadata to each token. During analysis, text is converted to tokens. This includes emitting a number of attributes about each token. There's standard attributes like position, term character offsets, and the term string itself. ALL of these can be part of the uninverted term vector. Another attribute is the payload which is arbitrary metadata you can attach to a term.

You can store any token attribute uninverted as a "term vector" including payloads, which you can access at scoring time.

To do this you need to

  1. Configure your field to store term vectors, including term vectors with payload
  2. Customize analysis to emit payloads that correspond to your terms. You can read more here
  3. Use an IndexReader.getTermVector to pull back Terms. From that you can get a TermsEnum. You can then use that to get a DocsAndPositionEnum which has an accessor for the current payload
  4. If you want to use this in scoring, consider a custom query or custom score query

Upvotes: 1

Related Questions