Reputation: 42095
I've skimmed the docs for the Java version of Lucene, but I can't really see the top-level "this is how it works" info so far (I'm aware I need to RTFM, I just can't see the wood for the trees).
I understand Lucene uses search indexes to return results. As far as I know, it only returns "hits" from those indexes. If I haven't added an item of data when building the index then it won't be returned.
That's fine, so now I want to check the following assumption:
Q: Does that mean that any data I want displayed on a search page needs to be added to the Lucene index?
I.e.
If I want to search for Product
s by things like sku, description, category name, etc, but I also want to display the Customer
they belong to in search results, do I:
Customer
's name in the index.JOIN
to get the Customer
's name.I assume it's option 1, since I'm assuming there's no way to "join" the results of a Lucene query to an RDBMS, but wanted to ask it my assumptions about the general usage are correct.
Upvotes: 4
Views: 1749
Reputation: 334
I have been trying to figure out the same problem, but I think that its too much work. I'm thinking of this as an alternative. Plse correct me if I'm wrong in my thinking!
Your situation is like this: RDBMS product (many) <------> (many) Customer
Instead of putting only customer in lucene index to get product keys, and then query RDBMS with IN Query, I'd suggest, create the lucene index with the cartesian product of Product as well as Customer.
Like customer_1, product_1 customer_1, product_2 customer_2, product_2..
This way, when you are searching for a product in lucene, it will give both the customer as well as the products id.. and instead of joining them in RDBMS, you can simply look up those customers as well as products for more information from RDBMS, if there is a need. If you are using caching, then the additional details lookup cost will also go down.
Upvotes: 1
Reputation: 42095
Based on BrokenGlass's answer, I've thought some more and am proposing the following to see if I'm on the right lines:
Basically, taking option 2 further, one could do the following:
IN (value [, value])
predicate.JOIN
s to other tables).Of course steps 5 and 6 could be better, but for the sake of explanation I put that verbose method in my description. If the Lucene hits include some sort of "relevance" value, then you could attribute that to the resultset and perform a standard sort, but that's an exercise for the reader. :)
Could this be it?
Upvotes: 0
Reputation: 160852
Usually the index would only contain the fields you want to search on, not necessarily the ones you want to display. Indexes should be optimized to be as small as possible, to keep search performance good.
To be able to display more data add a field to your index that allows you to retrieve your full document/data, i.e. a unique key for your Product
(product id?).
Upvotes: 1