Thomas
Thomas

Reputation: 1123

Solr design for searching in complex JSON

What would be a good design for using Solr to search in complex JSON? For instance there could be a document like:

{
    "books" : [
        {
            "title" : "Some title",
            "author" : "Some author",
            "genres" : [
                "thriller",
                "drama"
             ]
        },
        {
            "title" : "Some other title",
            "author" : "Some author",
            "genres" : [
                "comedy",
                "nonfiction",
                "thriller"
             ]
         }
    ]
 }

A sample query would be to get all documents that have a book whose author is "Some author" and one of the book's genres is "drama".

Right now the design I came up with is to have a dynamicField in the schema.xml that indexes everything as text (for now), like so:

 <dynamicField name="*" type="text" index="true" stored="true"/>

Then SolrJ is used to parse the JSON and create a SolrInputDocument with fields for each piece of data. For instance these are the field/values that would be created for the example JSON above:

books0.title : "Some title"
books0.author : "Some author"
books0.genres0 : "thriller"
books0.genres1 : "drama"
books1.title : "Some other title"
books1.author : "Some author"
books1.genres0 : "comedy"
books1.genres1 : "nonfiction"
books1.genres2 : "thriller"

At this point we could use the LukeRequestHandler to get all the fields in the index, and then make a big Solr query that checks all the fields we are interested in. For the sample query above the query would check all "books#.author" and "books#.genres#" fields. This solution seems inelegant and the queries could get very big if there are a lot of fields.

Being able to use wildcards in field names would be useful, but I don't think that is possible with Solr.

Is there a better way to accomplish this, possibly by using some clever combination of "copyField" and "multiValued" in the schema?

Upvotes: 1

Views: 3284

Answers (2)

Jayendra
Jayendra

Reputation: 52779

You can index the book entity as documents.

<field name="id" type="string" indexed="true" stored="true" required="true" />  
<field name="title" type="text_general" indexed="true" stored="true"/>   
<!-- Don't perform stemming on authors - You can use field with lower case, ascii folding for analysis -->   
<field name="authors" type="string" indexed="true" stored="true" multiValued="true"/>  
<field name="genre" type="string" indexed="true" stored="true" multiValued="true"/>  

Use Dismax parser to search upon authors and genre.
Match on these field should return you back the document.
You can use genre for filtering with filter query as well e.g. fq=genre:drama

If you want search behavior different for fields you can simply use copyField to copy the fields and have a different analysis performed on them. e.g.

<field name="genre_search" type="text_general" indexed="true" stored="true" multiValued="true"/>

<copyField source="genre" dest="genre_search"/>

Upvotes: 2

Persimmonium
Persimmonium

Reputation: 15791

maybe it's worth for you to look at Solr Joins. It is only available in 4.0, now on alpha, but could allow you to model at least part or maybe all those complex relationships. Performance is not as good as vanilla solr with no joins, but could be perfectly valid, you should verify.

Upvotes: 0

Related Questions