Andy McCluggage
Andy McCluggage

Reputation: 38728

What would be the best way to index and search my data using Lucene?

I’ve found multiple questions on SO and elsewhere that ask questions along the lines of “How can I index and then search relational data in Lucene”. Quite rightly these questions are met with the standard response that Lucene is not designed to model data like this. This quote I found sums it up…

A Lucene Index is a Document Store. In a Document Store, a single document represents a single concept with all necessary data stored to represent that concept (compared to that same concept being spread across multiple tables in an RDBMS requiring several joins to re-create).

So I will not ask that question and instead provide my high level requirements and see if any Lucene gurus out there can help me.

We have two entities – Person and Company – that have their own properties and then properties exist for the many-to-many link between them.

Some example searches could be as follows…

The criteria span all the three sets of data. Our requirement is to provide a Faceted Search over the data that accepts any combination of the various properties, of which I have given some examples.

I would like to use Lucene.Net for this. We are a .Net software house and so feel slightly intimidated by java. However, all suggestions are welcome.

I am aware of the idea that the Index should be constructed with the search in mind. But I can’t seem to come up with a sensible index that would meet all the combinations of search criteria

For now I won’t describe the scenarios we have considered because I don’t want to bloat out this question and make it too intimidating. Please ask me to elaborate where necessary.

Upvotes: 1

Views: 475

Answers (1)

Fred Foo
Fred Foo

Reputation: 363817

To store both companies and people in a single index, you could create documents with a type field that identifies the type of entities they describe.

Birthdays can be stored as date fields.

You could give each person a simple text field containing the names of companies that they worked for. Note that you won't get an error if you enter a company that is not represented by a document in your index. Lucene is not a relational DB tool, but you knew that.

(Sorry that I've not posted any links to the API; I'm familiar with Lucene Core but not Lucene.NET.)

Upvotes: 2

Related Questions