maaz
maaz

Reputation: 4493

Solr 4.0 storing and searching Normalize data of Profile

I was evaluating Solr 4.0 and Elastic Search 0.20.5 for linkedin type searching and wondering how to store Normalize data of user profile which can easily achieved in elasticsearch using nested document.

For example
Person Json

{
    first_name: abc,
    last_name: xyz,
    school: [{
      name: some school,
      degree: x-Degree,
      startDate:12-02-2009
   },
   {
      name: some school2,
      degree: x-Degree-2,
      startDate:12-02-2012
   }
   ]

}

I want to search on users schools name, degrees and currently studing similar to linkedin search,

What's the best way to index and search it in Solr?

Upvotes: 5

Views: 551

Answers (3)

maaz
maaz

Reputation: 4493

Indexing should be done using multiValued fields

<field name="first_name" indexed="true" />
<field name="last_name" indexed="true" />
<field name="school_name" multiValued="true" indexed="true" />
<field name="school_degree" multiValued="true" indexed="true" />
<field name="school_start_date" multiValued="true" indexed="true" />


Searching, searching single field like school_name will be simple as ordinary field searching, however searching on multiple nested fields should be treated differently,

Combining the SpanTermQueries with FiledMaskingSpanQuery and putting them inside SpanNearQuery allows searching the intersections of the school' positions, and properly find the Person, which contains the specified Item (school_name:some school and school_degree:x-Degree):

SpanNearQuery(
    SpanTermQuery("school_name", "some school”),
    FieldMaskingSpanQuery(
               SpanTermQuery("school_degree", "x-Degree"),
               “school_name”
    ), -1, false
)

Reference

Upvotes: 0

Lukasz Kujawa
Lukasz Kujawa

Reputation: 3096

I'm sure you can achieve exactly what you want. There are many field types and community plugins. The only problem is it's hard to find a good documentation.

You can obviously go for multiValued fields like @pickypg suggested. The problem will occur when you will try to search by school_name and school_degree in one query. Results will be incorrect.

What I'm doing for slightly different problem is using PointType class:

<fieldType name="range" class="solr.PointType" dimension="1" subFieldType="double" />

<field name="cat_lr" type="range" indexed="true" stored="true" multiValued="true"/>

It allows me to have multiple ranges per document. I insert them like this:

cat_lr=2,5

and I look for them like this:

+cat_lr:[1 TO 10]

I hope that will help with you issue. Good luck with documentation.

Upvotes: 1

pickypg
pickypg

Reputation: 22332

Unfortunately, Solr is not as capable of defining nested documents as elasticsearch.

In Solr's case, the answer is to use multiValued fields that mimic the desired information in the flattened document. Personally, I find this to be very limiting, particularly because grouped details (objects) may be separated, but it is the Solr way. You can use dynamic fields to fix this (e.g., school_name_1 is linked with school_degree_1 and school_name_2 with school_degree_2), as suggested by arun's referenced link, but that's a much bigger hassle compared to the flexibility of elasticsearch.

If your document is in XML, then you can use the XPathEntityProcessor to automatically flatten it. Perhaps more unfortunately, I am not aware of any JSON processor that performs the analogous action.

You're going to want a schema similar to:

<field name="first_name" indexed="true" />
<field name="last_name" indexed="true" />
<field name="school_name" multiValued="true" indexed="true" />
<field name="school_degree" multiValued="true" indexed="true" />
<field name="school_start_date" multiValued="true" indexed="true" />

Don't forget about the end date. You may also want to consider that students can have multiple degrees, though this could be solved by simply doubling up on the school, or making the degree an array when it's the same starting year.

Upvotes: 1

Related Questions