Dennis Kozevnikoff
Dennis Kozevnikoff

Reputation: 2277

How to implement "lowercase" in an ElasticSearch query?

I am trying to implement lowercase functionality in ElasticSearch. According to their API docs here

https://www.elastic.co/guide/en/elasticsearch/reference/current/lowercase-processor.html

you use this code snippet

{
  "lowercase": {
   "field": "foo"
  }
}

in the query to get a lowercase value contained in the required field.

They do not have a specific example and I keep getting an error when I execute a search query.

This is what I tried:

 POST /users/_search
 {
  "size" : 10,
  "_source" : {
   "includes" : [
    "userid",
    "username" 
   ]
  },
  "query" : {
      "query_string" : {
          "query" : "*John*",
           "lowercase": { "default_field" : "username.keyword"}
      }
   },
 "sort" : [
   {
     "_doc" : {
      "order" : "desc"
    }
  }
 ]
}

In the above query I try to find a username 'john' (converted 'John' to lowercase).

Error message is as follows:

{
  "error" : {
   "root_cause" : [
    {
     "type" : "parsing_exception",
     "reason" : "[query_string] unknown token [START_OBJECT] after 
   [lowercase]",
     "line" : 18,
     "col" : 27
     }
   ],
  "type" : "parsing_exception",
  "reason" : "[query_string] unknown token [START_OBJECT] after 
[lowercase]",
  "line" : 18,
  "col" : 27
 },
  "status" : 400
 }

The same query works (although it does not give me the result that I need) if I replace

 "lowercase": { "default_field" : "username.keyword"}

with

  "default_field" : "username.keyword"

Any suggestions about how I can fix this query? Thanks!

Upvotes: 0

Views: 1181

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16943

The processor you're referencing is part of an ingest pipeline -- enabling you to transform your data before it gets ingested. It has nothing to do with querying data.

You can simulate an ingest pipeline like so:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "lowercase": {
          "field": "username"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "username": "John"
      }
    }
  ]
}

Check this answer to see pipelines in action.


As you ingest textual data into Elasticsearch, it gets analyzed and tokenized. The default analyzer is the standard analyzer and you can see how it'd tokenize the word "John" when you run:

GET _analyze
{
  "text": "John",
  "analyzer": "standard"
}

As you can see, it auto-lowercases any input text. This means that when you ingest a single doc into a new index called indexname:

POST indexname/_doc
{
  "username": "John"
}

you can then search lowercase tokens straightaway:

GET indexname/_search
{
  "query": {
    "query_string": {
      "default_field": "username",
      "query": "john*"
    }
  }
}

As a matter of fact, you don't even need the wildcard * at the end.

Upvotes: 1

Related Questions