Reputation: 15911
I have an index in azure search that consists of person data like firstname and lastname.
When I search for 3 letter lastnames with a query like
rau&searchFields=LastName
/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName
The name rau is found but it is quite far at the end.
{
"@odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Liebetrau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Damerau"
},
{
"@search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Rau"
More to the top are names like "Liebetrau","Damerau".
Is there a way to have exact matches at the top?
EDIT
Querying the index definition using the RestApi
GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview
returned for LastName
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null,
"synonymMaps": []
Edit 1
The analyzer definition
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"name": "prefix",
"tokenizer": "standard",
"tokenFilters": [
"lowercase",
"my_edgeNGram"
],
"charFilters": []
}
],
"tokenizers": [],
"tokenFilters": [
{
"name": "my_edgeNGram",
"minGram": 2,
"maxGram": 20,
"side": "back"
}
],
"charFilters": []
Edit 2
At the end specifying a ScoringProfile that i use whene querying did the trick
{
"name": "person-index",
"fields": [
{
"name": "ID",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
}
,
{
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"analyzer": "my_standard"
},
{
"name": "PartialLastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null
}
],
"analyzers":[
{
"name":"my_standard",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding" ]
},
{
"name":"prefix",
"@odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters":[
{
"name":"my_edgeNGram",
"@odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":2,
"maxGram":20,
"side": "back"
}
],
"scoringProfiles":[
{
"name":"exactFirst",
"text":{
"weights":{ "LastName":2, "PartialLastName":1 }
}
}
]
}
Upvotes: 0
Views: 1756
Reputation: 1972
The analyzer "prefix" set on the LastName field produces the following terms for the name Liebetrau : au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau
. These are edge ngrams of length ranging from 2 to 20 starting from the back of the word, as defined in the my_edgeNGram token filter in your index definition. The analyzer will process other names in the same way.
When you search for the name rau, it matches all names as they all end with those characters. That's why all documents in your result set have the same relevance score.
You can test your analyzer configurations using the Analyze API.
To learn more about custom analyzers please go here and here.
Hope that helps
Upvotes: 2