Solr and SolrNet questions and guidance

Question

I am just getting started with Solr and SolrNet. Before I go too far with my current project, I want to verify that I am on the right track. Here is what I am trying to achieve:

Basic Requirements:

Provide a search solution that searches against multiple entities (Car, Ship, Plane, Bicycle, etc.) I just made these up for here. Each entity can have variable number of fields with non-symmetrical data.
Provide facets for each entity.
Provide filters for each entity.

Workflow:

User searches with a term.
Four links, one for each entity is displayed with a numeric value next to it indicating the number of hits per entity.
User clicks a link and page is displayed with facets and search results for that entity.

So given that requirement, here is what I have done so far:

Created a single index but with specific fields for each entity like car_name, car_model, car_company, ship_name, ship_model, ship_company, ship_age, ship_size etc.
I have a field entity_type in the index, which is set to one of the entities.
ID is unique accross all entities.
I have a DisMax search handler solrconfig.xml, in which I put all the fields (from all the entities) that should be searched on.

Here is what my DisMax search handler looks like:


    
        explicit
        edismax
        
            car_name car_company car_model ship_name ship_company ship_model ship_sailing_route plane_name plane_company plane_model bicycle_name bicycle_company bicycle_model
        
        *:*
        10
        *,score
        on
        car_comapny
        car_model
        ship_name
        ship_company
        ship_sailing_route
        ...

And finally, here are my questions:

Is this single index route the right approach or should I create separate index for each entity? Please explain.
If they should be separate indexes, how do I search across them for a given term? And importantly how do I do that using SolrNet?
Is there a way to search across all entities using SolrNet and retrieve the results? Or do I need to execute query for each entity separately (may be in parallel) with the same search term?
I have a list of all the fields that should be available as facets in the DisMax handler as you can see, is this the right approach? If not, what is?

I am sure I will have more questions as I work through my project, but for now this will do.

Fermin Silva · Accepted Answer

If you are going to have lots of items per type, splitting may be a wise idea (just for the sake of performance, nothing else). It also depends on the similarities and differences between the stuff you are trowing into the schema.
For example, bicicles, cars, ships, all have company, name, model, etc. in common, so you could have just name, model, company fields and then another one that says "vehicle_type". If the variable fields (call them optionals) are just a few, you can have dynamic fields for those, so you don't need a rigid schema.

If you go with different indexes (and schemas), your query needs to be aware of all the different fields and schemas. Not to mention that to have this you need a multicore instance, and (AFAIK) you cannot send a query to several cores at once.

It depends on what queries you want to do. Say you want to search for vehicles whose name is "vairo" but the user doesn't specify if he wants bicicles or cars or whatever. You need to distribute your search to all different cores like

/solr/bicicles/select?q=bicicle_name:vairo
/solr/cars/select?q=car_name:vairo
/solr/ships/select?q=ship_name:vairo

and then merge the results. If you put everything on a single index, you could simply search q=name:vairo. Then normally you would facet by "vehicle_type", telling the user that there are 1000 bicicles with that name, and a very few other vehicles. If the user now specifies "ok, gimme only the bicicles" you keep the query like before but add &fq=vehicle_type:bicicle.

This is far more convenient than handling the logic to which index you should query depending on the filter. Also merging results from more than one response is not trivial.

In our company we use a single solr for all categories. Obviously all of them have optionals not present on others (think Real estate vs vehicles). Some are handled with dynamic fields and some others with normal fields. SOLR is ok if you don't send something in the document.

For example:

squared_meters
rooms
vehicle_type
vehicle_doors

all in a single index. As you may guess, when we index a document chances are than half of the fields will be empty (either it's a car or its a house). SOLR is absolutely ok with that both at query and index times.

So, to sum it up:

Consider what kind of queries you want to do. If you either search bikes or cars, different indexes are ok
Consider how many documents you will have. If they're going to be millions, this logic split will be the best thing you can do to have better performance, but you'll have to do more queries!

Solr and SolrNet questions and guidance

Answers (1)

Related Questions