aboutgeo
aboutgeo

Reputation: 302

Lucene: Multiple StringFields with same name - query only matches on last

I have a Lucene Document that contains multiple StringFields with the same name like so:

doc.add(new StringField("uri", "http://www.doesn-t-work.com/foo", Field.Store.YES))
doc.add(new StringField("uri", "http://www.doesn-t-work.com/baz", Field.Store.YES))
doc.add(new StringField("uri", "http://www.this-works.com/bar", Field.Store.YES))

I'm using the StandardAnalyzer for writing the Document to the index, but as far as my understanding goes, this shouldn't matter since I'm using a StringField:

new IndexWriter(placeIndex, new IndexWriterConfig(Version.LUCENE_48, new StandardAnalyzer(Version.LUCENE_48)))

What I want to do (obviously) is run a query where I can search for any of the values of the URI field, and get back the document. But I only get a result back when I query with the URI value that was added to the document last. Querying with any other field values (i.e. the 'doesn-t-work' ones) returns zero hits.

The query I'm using is this:

new TermQuery(new Term("uri", "http://www.doesn-t-work.com/foo")) // 0 hits

new TermQuery(new Term("uri", "http://www.this-works.com/bar")) // 1 hit

Additional Note: when I get the document back by querying with the last URI, I can definitely verify that all three URI values are stored. It's just the indexing that seems to be overwritten... (or I'm mis-interpreting how/if multi-valued StringFields work).

Any hints much appreciated!

Upvotes: 1

Views: 607

Answers (1)

aboutgeo
aboutgeo

Reputation: 302

Ouch - it turned out this was caused by another problem. I tested this scenario separately, and it works perfectly in the way described above.

In my application, however, I was adding the different URIs one by one in between indexing. I.e. I added one URI, wrote the Document to the index. Later, I'd retrieve the Document, add another URI, wrote it back, and so on.

In short, it turned out that re-using the Documents was the issue: when I created a new Document instance from scratch with multiple URI fields, everything went well.

Lesson of the day, I guess: don't re-use your Lucene Documents.

Upvotes: 2

Related Questions