NagyI
NagyI

Reputation: 5997

Solr denormalization and update of referenced data

Consider the following situation. We have a database which stores writers and books in two separate tables. One book obviously stores the reference to the writer who wrote the book. For Solr i have to denormalize this structure into one big document where every book contains the details of the writer associated. This index is now used for querying books.

One user of the system now decides to update a writer record in the system. Because many books can be associated with it i have to update every document in Solr which have embedded data from this writer record. This is very painful because i have to delete and re-add every affected document as far as i know.

Is there any better way of doing this? I need near realtime update of the index in the system if one of the referenced data gets modified.

Upvotes: 3

Views: 1325

Answers (2)

javanna
javanna

Reputation: 60235

This would be a perfect usecase for nested documents. As far as I know lucene does support nested documents but Solr doesn't, not totally sure about the current state of this feature.

This feature is available in elasticsearch though. You might want to have a look at it, there's an article I just wrote that can be interesting if you want to know what's so cool about elasticsearch in my opinion. Your question just reminded me that I didn't mention the nested documents feature in my article, which is really cool too. You can use the nested type in your mapping. If you want to know more you can have a look at this article. By the way it contains exactly the books/authors example.

Elasticsearch also helps you while updating documents. You don't need to reindex the whole document but send only the changes through a script. Thanks to the fact that it stores the source document that has been indexed it internally retrieves it, updates it running the script and reindexes it. That's how lucene internally works since its index segments are write-once. With Solr 4, which will be soon released, you can update documents providing only the changes, but as far as I know this works only if all your fields are stored. The fields that are not stored cannot be retrieved from the index.

If we are talking about Near Real Time updates, elasticsearch does use the Lucene Near Real Time API and refreshes automatically the index reader every second. Solr 3 doesn't use yet those APIs but Solr 4 does.

Upvotes: 2

RaB
RaB

Reputation: 1565

For updating nested types in SOLR you can use dataimporters and delta imports. The example on https://wiki.apache.org/solr/DataImportHandler#Delta-Import_Example shows how this would work. Obviously you would then need to have solr access your database.

Upvotes: 0

Related Questions