Cody
Cody

Reputation: 8954

How can I use NEST QueryString and escape the special characters?

I am using NEST to communicate with Elasticsearch in my applications.

In this case, the user enters in their search term F5503904902 which returns the correct result. However, if they search for the query F5503904902-90190 or F5503904902-90190_55F the results do not come back.

I assumed this was because of the special characters, so I attempted to escape them - but then no results come back either. Is my query correct, am I doing something wrong? Also I am appending a wildcard to the end of the escaped query to match anything open ended.

Search Method:

public IPagedSearchResult<MyFileObject> Find(ISearchQuery query)
{
    ElasticClient client = ElasticClientManager.GetClient(_indexCluster, ElasticSearchIndexName.MyFileObjects);
    string queryString = EscapeSearchQuery(query.Query) + "*"; 
    var searchResults = client.Search<MyFileObject>(s => s
        .From(query.Skip)
        .Size(query.Take)
        .QueryString(queryString));



    IPagedSearchResult<MyFileObject> pagedSearchResult = new PagedSearchResult<MyFileObject>();
    pagedSearchResult.Results = searchResults.Documents;
    pagedSearchResult.Skip = query.Skip;
    pagedSearchResult.Take = query.Take;
    pagedSearchResult.Total = Convert.ToInt32(searchResults.Total);

    return pagedSearchResult;
}

Escape Method:

private string EscapeSearchQuery(string query)
{
    if (String.IsNullOrWhiteSpace(query)) return query;

    //&& || not handled here
    char[] special = { '+', '-', '=', '>', '<', '!', '(', ')', '{', '}', '[', ']', '^', '\"', '~', '*', '?', ':', '\\', '/', ' ' };
    char[] qArray = query.ToCharArray();

    StringBuilder sb = new StringBuilder();

    foreach (var chr in qArray)
    {
        if (special.Contains(chr))
        {
            sb.Append(String.Format("\\{0}", chr));
        }
        else
        {
            sb.Append(chr);
        }
    }

    return sb.ToString();
}

I would love any help or pointers why this isn't working or better ways to accomplish this.

Upvotes: 3

Views: 4481

Answers (1)

jhilden
jhilden

Reputation: 12439

In ElasticSearch dash and underscore are not special characters, but they are characters that cause the terms to be split. The important thing is the index on the field. I recommend setting up a multifield.

https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/multi-fields.html

Here is an example:

PUT hilden1

PUT hilden1/type1/_mapping
{
  "properties": {
    "multifield1": {
      "type": "string",
      "fields": {
        "raw": {
          "type": "string", 
          "index": "not_analyzed"
        }
      }
    }
  }
}

POST hilden1/type1
{
  "multifield1": "hello"
}

POST hilden1/type1
{
  "multifield1": "hello_underscore"
}

POST hilden1/type1
{
  "multifield1": "hello-dash"
}

Let's try to find the dashed value:

GET hilden1/type1/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "multifield1": "hello-dash"
        }
      }
    }
  }
}

That returns no results because ES is splitting the field into two parts behind the scenes. But, because we setup this field as a multi-field we can query for it based on the ".raw" that we set. This query will get the results you're looking for.

GET hilden1/type1/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "multifield1.raw": "hello-dash"
        }
      }
    }
  }
}

Upvotes: 4

Related Questions