Searching Solr index for concatenated words

Question

I'm struggling with two similar use cases.

Here's an example document from my index:

{
        "id":"E850AC8D844010AFA76203B390DD3135",
        "brand_txt_en":"Tom Ford",
        "catch_all":["Tom Ford",
          "FT 5163",
          "Tom Ford",
          "FT 5163",
          "DARK HAVANA"],
        "model_txt_en":"FT 5163",
        "brand_txt_en_split":"Tom Ford",
        "model_txt_en_split":"FT 5163",
        "color_txt_en":"DARK HAVANA",
        "material_s":"acetato",
        "gender_s":"uomo",
        "shape_s":"Wayfarer",
        "lens_s":"cerchiata",
        "modelkey_s":"86_1_FT 5163",
        "sales_i":0,
        "brand_s":"Tom Ford",
        "model_s":"FT 5163",
        "color_s":"DARK HAVANA",
        "_version_":1569456572504997895
}

Query: brand_txt_en_split:tomford

No results!

Field type is Solr's default one:

I expect WordDelimiterFilterFactory to generate "tomford" token by concatenating words but it looks like that's not working as expected.

The 'inverse' use case is:

{ 
   ...  "model_txt_en_split": "The Clubmaster", ...
}

I want that document to be found after this query: club master

I guess I should use EdgeNGram filter for the latter case, but really can't get how to do that.

Thanks for your help

Abhijit Bashetti · Accepted Answer

The WordDelimiterFilterFactory has the catenateWords and catenateAll. It works where you have :

catenateWords: (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor's" -> "hotspotsensor"

catenateAll: (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" -> "ZapMaster9000"`

To remove the space between the words please try the below filter.

Once you add/update the schema.xml. Restart the server and re-index the data.

You can try the below fieldType for you field name.

Input String: "John Oliver W Clane"

Tokenizer to Filter: "John Oliver W Clane"

Output Tokens :

"John", "John ", "John O", "John Ol", "John Oli", "John Oli", "John Oliv", "John Olive", "John Oliver", "John Oliver ", "John Oliver W", "John Oliver W "
, "John Oliver W C", "John Oliver W Cl", "John Oliver W Cla", "John Oliver W Clan", "John Oliver W Clane".

There is another filter you can try the same .

You can read more about the analyzers and filters Solr Analyzers and Filters

Searching Solr index for concatenated words

Answers (1)

Related Questions