Reputation: 14656
I'm building an app where users can enter their skills and companies can search (using ElasticSearch) users with specific skills.
I create an index like this:
client.indices.create({
index: "candidates",
body: {
mappings: {
candidate: {
properties: {
languages: {type: 'text'},
skills: {type: 'text'},
},
},
},
},
}, (err, data) => {
if (err) console.log('err ', err);
if (data) console.log('data ', data);
})
}
In the following example, I want to search users who have skills with "Facebook Ads" and "Online Marketing".
Results should be sorted so users with two matches should be at the top.
{
"index": "candidates",
"type": "candidate",
"size": 10000,
"body": {
"query": {
"bool": {
"must": [
{
"bool": {
"should": {
"terms": {
"skills": [
"facebook ads",
"online marketing"
]
}
}
}
}
]
}
}
}
}
This above query returns zero results.
Problem:
As explained here I should avoid using term
(or terms
) for text
fields.
Question: How can I implement a search query that takes an array of strings (some of which contains spaces) as input and returns a list of ordered hits? By ordered hits I mean that users who match the most of the skills in the query should be at the top.
EDIT
Here is an example of a user who has skills with both Facebook Ads and Google Ads:
{
"_index" : "candidates",
"_type" : "candidate",
"_id" : "2fbbd818-sdhkfgkjhg-3235465hgfds",
"_score" : 9.1202545,
"_source" : {
"skills" : [
"Online strategi",
"Facebook Ads",
"Google Ads"
],
"languages": [
"da",
"en"
]
}
},
A search for ['Facebook Ads', 'Google Ads'] should return the above user at the top (matches both Facebook Ads and Google Ads), but users with only one match should also be returned.
Upvotes: 1
Views: 2122
Reputation: 1836
Ok Here is what I did
1) created the mappings for the data
2) indexed 3 documents. One document is same one as you posted above and one
is completely irrelevant data, and the third document has one search field
matching, so less relevance than the first document but more relevance
than the other document
3) the search query
when I ran the search, the most relavent document showed up top with most match and then the second document.
Please also see that I am passing multiple strings as you expected using double quotes and single quotes in the search query. You can build a array of strings or a string with concatenated strings (with spaces as you wanted etc) ..should work
Here is the mappings
PUT ugi-index2
{
"mappings": {
"_doc": {
"properties":{
"skills": {"type": "text"},
"languages": {"type": "keyword"}
}
}
}
}
and the three documents that I indexed
POST /ugi-index2/_doc/3
{
"skills" : [
"no skill",
"Facebook ads",
"not related"
],
"languages": [
"ab",
"cd"
]
}
POST /ugi-index2/_doc/2
{
"skills" : [
"no skill",
"test skill",
"not related"
],
"languages": [
"ab",
"cd"
]
}
POST /ugi-index2/_doc/1
{
"skills" : [
"Online strategi",
"Facebook Ads",
"Google Ads"
],
"languages": [
"da",
"en"
]
}
And the search query
GET /ugi-index2/_search
{
"query":{
"multi_match": {
"query": "'Online Strate', 'Facebook'",
"fields": ["skills"]
}
}
}
look at the query above for multi strings with spaces (for search)
and here is the response
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "ugi-index2",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.5753642,
"_source" : {
"skills" : [
"Online strategi",
"Facebook Ads",
"Google Ads"
],
"languages" : [
"da",
"en"
]
}
},
{
"_index" : "ugi-index2",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"skills" : [
"no skill",
"Facebook ads",
"not related"
],
"languages" : [
"ab",
"cd"
]
}
}
]
}
}
Upvotes: 2
Reputation: 389
If you want to match the exact term you would want to make sure you also store the skill as a keyword. This will leave the space intact and allow for an exact match. The common way to utilize this in a user interface is to provide a filter with the keyword data as predefined filter options.
If you still want to use a full text search where the user can provide arbitrary search data you can rely on the fact that a doc containing "Facebook" and "Ads" will return with a higher score than a doc containing only "Facebook".
Upvotes: 1