Ke.
Ke.

Reputation: 2586

multiple facets in field separated by semicolon

I have a field with data like this. Here is an example row:

London;Greater London;London City

I would like to end up with the following facets

London (count 10) Greater London (count 5) London City (count 2)

Im just stuck as to the right query to use.

Can Solr have multiple facets within a single field?

Cheers

k

Upvotes: 0

Views: 948

Answers (2)

Christine Salter
Christine Salter

Reputation: 143

I found a pretty good option based on Alexandre Rafalovitch's suggestion. Instead of getting into Java, I created some simple javascript and called it from a StatelessScriptUpdateProcessorFactory. In my case, I have a couple fields I need to split this way, so some of my code reflects that.

I should also point out that this is basically prototype code. You'd probably want to spend a little time improving it, making it easier to configure, etc. I know I will! But by the time I finish with that, I'll probably forget to update this question, and I figure a hacky answer is better than no answer at all. :-) (I'll try to remember to come back and update once I'm done, but...)

Now, in solrconfig.xml, I added an updateRequestProcessingChain:

  <updateRequestProcessorChain name="splitting">
    <processor class="solr.StatelessScriptUpdateProcessorFactory">
      <str name="script">split-script.js</str>
      <lst name="params">
        <str name="splitFields">firstfield,secondfield</str>
      </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

I also added this to the "defaults" config in/update/extract:

<str name="update.chain">splitting</str>

For split-script.js, I just copied the sample update-script.js already in the conf folder and modified the processAdd function:

function processAdd(cmd) {

  doc = cmd.solrDoc;  // org.apache.solr.common.SolrInputDocument
  id = doc.getFieldValue("id");
  logger.info("splitter-script#processAdd: id=" + id);

  fields_param = params.get('splitFields');  // "params" only exists if processor configured with <lst name="params">

  fields = fields_param.split(',');
  for (var i = 0; i < fields.length; i++)
  {
    var fieldName = fields[i];
    var field = doc.getField(fieldName);
    if (field)
    {
      var value = field.getValue();
      if (value)
      {
        // Remove the old field so the un-split value doesn't also show up in the list...
        doc.removeField(fieldName);
        doc.addField(fieldName, value.split(';'));
      }
    }
  }
}

Seems to be working for me, hopefully it helps someone else too!

Upvotes: 0

Alexandre Rafalovitch
Alexandre Rafalovitch

Reputation: 9789

You have two options.

The best one is to use multi-valued fields. Which means you need to split your content coming in on the semi-colons. This will depend on how you are getting data in. For example, CSV allows you to just declare the field as multi-value and splitting on semi-colon. DataImportHandler has RegexTransformer that also allows you to spit content. Or you could use Request Processor which can apply to any source, but I don't think there is a splitting one out of the box. You would need to write one.

The other option is to realize that faceted fields use tokenized values rather than stored values. Usually, faceted fields as defined as strings exactly because of it. However, if you cannot get the first approach to work at all (and you should try hard to), you can configure a special field type that just splits the tokens on semi-colons and does no other processing. You would use PatternTokenizerFactory for that.

Upvotes: 1

Related Questions