Vingtoft
Vingtoft

Reputation: 14656

ElasticSearch | How to search a list of strings containing whitespace?

I'm building an app where users can enter their skills and companies can search (using ElasticSearch) users with specific skills.

I create an index like this:

client.indices.create({
    index: "candidates",
    body: {
      mappings: {
        candidate: {
          properties: {
            languages: {type: 'text'},
            skills: {type: 'text'},
          },
        },
      },
    },
  }, (err, data) => {
    if (err) console.log('err ', err);
    if (data) console.log('data ', data);
  })
}

In the following example, I want to search users who have skills with "Facebook Ads" and "Online Marketing".

Results should be sorted so users with two matches should be at the top.

{
  "index": "candidates",
  "type": "candidate",
  "size": 10000,
  "body": {
    "query": {
      "bool": {
        "must": [
          {
            "bool": {
              "should": {
                "terms": {
                  "skills": [
                    "facebook ads",
                    "online marketing"
                  ]
                }
              }
            }
          }
        ]
      }
    }
  }
}

This above query returns zero results.

Problem: As explained here I should avoid using term (or terms) for text fields.

Question: How can I implement a search query that takes an array of strings (some of which contains spaces) as input and returns a list of ordered hits? By ordered hits I mean that users who match the most of the skills in the query should be at the top.

EDIT

Here is an example of a user who has skills with both Facebook Ads and Google Ads:

{
        "_index" : "candidates",
        "_type" : "candidate",
        "_id" : "2fbbd818-sdhkfgkjhg-3235465hgfds",
        "_score" : 9.1202545,
        "_source" : {
          "skills" : [
            "Online strategi",
            "Facebook Ads",
            "Google Ads"
          ],
          "languages": [
            "da",
            "en"
          ]
        }
      },

A search for ['Facebook Ads', 'Google Ads'] should return the above user at the top (matches both Facebook Ads and Google Ads), but users with only one match should also be returned.

Upvotes: 1

Views: 2122

Answers (2)

JBone
JBone

Reputation: 1836

Ok Here is what I did

1) created the mappings for the data
2) indexed 3 documents. One document is same one as you posted above and one 
   is completely irrelevant data, and the third document has one search field 
   matching, so less relevance than the first document but more relevance 
   than the other document
3) the search query

when I ran the search, the most relavent document showed up top with most match and then the second document.

Please also see that I am passing multiple strings as you expected using double quotes and single quotes in the search query. You can build a array of strings or a string with concatenated strings (with spaces as you wanted etc) ..should work

Here is the mappings

  PUT ugi-index2
    {
      "mappings": {
        "_doc": {
           "properties":{
             "skills": {"type": "text"},
             "languages": {"type": "keyword"}
        }
       }
     }
    }

and the three documents that I indexed

   POST /ugi-index2/_doc/3
     {
        "skills" : [
           "no skill",
           "Facebook ads",
           "not related"
          ],
        "languages": [
           "ab",
           "cd"
         ]

    }

  POST /ugi-index2/_doc/2
   {
      "skills" : [
           "no skill",
           "test skill",
           "not related"
           ],
          "languages": [
            "ab",
            "cd"
           ]

    }




   POST /ugi-index2/_doc/1
     {
        "skills" : [
           "Online strategi",
           "Facebook Ads",
           "Google Ads"
         ],
         "languages": [
          "da",
          "en"
         ]

     }

And the search query

  GET /ugi-index2/_search
    {
      "query":{
      "multi_match": {
       "query": "'Online Strate', 'Facebook'",
       "fields": ["skills"]
     }
    }
   }

look at the query above for multi strings with spaces (for search)

and here is the response

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5753642,
    "hits" : [
      {
        "_index" : "ugi-index2",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.5753642,
        "_source" : {
          "skills" : [
            "Online strategi",
            "Facebook Ads",
            "Google Ads"
          ],
          "languages" : [
            "da",
            "en"
          ]
        }
      },
      {
        "_index" : "ugi-index2",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "skills" : [
            "no skill",
            "Facebook ads",
            "not related"
          ],
          "languages" : [
            "ab",
            "cd"
          ]
        }
      }
    ]
  }
}

Upvotes: 2

Mike Frank
Mike Frank

Reputation: 389

If you want to match the exact term you would want to make sure you also store the skill as a keyword. This will leave the space intact and allow for an exact match. The common way to utilize this in a user interface is to provide a filter with the keyword data as predefined filter options.

If you still want to use a full text search where the user can provide arbitrary search data you can rely on the fact that a doc containing "Facebook" and "Ads" will return with a higher score than a doc containing only "Facebook".

Upvotes: 1

Related Questions