Priyanka
Priyanka

Reputation: 105

How to create Solr schema for hierarchical facet by splitting data into multiple fields at index time

I want to implement Solr hierarchical facet for my application where there is 2 level hierarchy between Category and SubCategory. I want to use a solution mentioned on http://wiki.apache.org/solr/HierarchicalFaceting#Pivot_Facets link.

The flattened data will be as below:

Doc#1: NonFic > Law
Doc#2: NonFic > Sci
Doc#3: NonFic > Sci > Phys

And this data should be split into a separate field for each level of the hierarchy at index time. Same as below.

Indexed Terms

Doc#1: category_level0: NonFic; category_level1: Law
Doc#2: category_level0: NonFic; category_level1: Sci
Doc#3: category_level0: NonFic; category_level1: Sci, category_level2:Phys

So can anyone please suggest ways to implement this? How do I define Solr schema to achieve this? I could not find any reference for splitting data as mentioned above at Index time.

Thanks,

Priyanka

Upvotes: 3

Views: 2509

Answers (2)

Alex
Alex

Reputation: 68

To split the data, use a ScriptTransformer that allows you to transform the data using Javascript within your config files.

Add the following to your db-data-config at the same level as dataSource and document. This defines a function that splits the string within a field on the delimiter, >, and adds a field for each of the split values called category_level0, category_level1,...

<script><![CDATA[
    function CategoryPieces(row) {
        var pieces = row.get('ColumnToSplit').split('>');
        for (var i=0; i < pieces.length; i++) {
            row.put('category_level' + i, pieces[i]);
        }
        return row;
    }
]]></script>

Then in your main <entity> tag, add transformer="script:CategoryPieces", and add the columns to your field list.

<field column="category_level0" name="Category_Level0" />
<field column="category_level1" name="Category_Level1" />

Last, in your schema.xml, add the new fields.

<field name="Category_Level0" type="string" indexed="true" stored="true" multiValued="false" />
<field name="Category_Level1" type="string" indexed="true" stored="true" multiValued="false" />

Upvotes: 0

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

Do you need to display those individual fields as part of the documents returned? In which case you need those split values in 'stored' version of the field. If you only need to have them during search or during faceting, you can ignore the 'stored' form and concentrate on 'indexed' form.

In either case, if you need to split one field into several, you can do that with copyField or with UpdateRequestProcessor.

With copyField, the 'stored' form will be the same for all fields, but you can have different processors for each field, picking different part of the hierarchy for the 'indexed' part.

With UpdateRequestProcessor, you can write a custom one that takes one field and then spits out several fields, each with only its part of the path. You can do a custom one or do a couple of field copies and then different Regex processor on each field.

Upvotes: 1

Related Questions