Grammin
Grammin

Reputation: 12205

How can I find all documents in elasticsearch that contain a number in a certain field?

I have a keyword type'd field that can contain either a number or a string. If the field does not contain any letters, I would like to hit on that document. How can I do this?

My index mapping looks like:

{
  "mappings": {
    "Entry": {
      "properties": {
        "testField": {
          "type": "keyword"
        }
      }
    }
  }
}

My documents look like this:

{
  "testField":"123abc"
}

or

{
  "testField": "456789"
}

I've tried the query:

{
  "query": {
    "range": {
      "gte": 0,
      "lte": 2000000
    }
  }
}

but it stills hits on 123abc. How can I design this so that I only hit on the documents with a number in that particular field?

Upvotes: 0

Views: 98

Answers (2)

Val
Val

Reputation: 217274

There is another more optimal option for achieving exactly what you want. You can leverage the ingest API pipelines and using a script processor you can create another numeric field at indexing time that you can then use more efficiently at search time.

The ingestion pipeline below contains a single script processor which will create another field called numField that will only contain numeric values.

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "source": """
          ctx.numField = /\D/.matcher(ctx.testField).replaceAll("");
          """
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "testField": "123"
      }
    },
    {
      "_source": {
        "testField": "abc123"
      }
    },
    {
      "_source": {
        "testField": "123abc"
      }
    },
    {
      "_source": {
        "testField": "abc"
      }
    }
  ]
}

Simulating this pipeline with 4 different documents having a mix of alphanumeric content, will yield this:

{
  "docs" : [
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "numField" : "123",
          "testField" : "123"
        },
        "_ingest" : {
          "timestamp" : "2019-05-09T04:14:51.448Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "numField" : "123",
          "testField" : "abc123"
        },
        "_ingest" : {
          "timestamp" : "2019-05-09T04:14:51.448Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "numField" : "123",
          "testField" : "123abc"
        },
        "_ingest" : {
          "timestamp" : "2019-05-09T04:14:51.448Z"
        }
      }
    },
    {
      "doc" : {
        "_index" : "_index",
        "_type" : "_type",
        "_id" : "_id",
        "_source" : {
          "numField" : "",
          "testField" : "abc"
        },
        "_ingest" : {
          "timestamp" : "2019-05-09T04:14:51.448Z"
        }
      }
    }
  ]
}

After indexing your documents using this pipeline, you can run your range query on numField instead of testField. Compared to the other solution (sorry @Kamal), it will shift the scripting burden to run only once per document at indexing time, instead of everytime on every document at search time.

{
  "query": {
    "range": {
      "numField": {
        "gte": 0,
        "lte": 2000000
      }
    }
  }
}

Upvotes: 1

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8840

Afaik, Elasticsearch does not have a direct solution for this.

Instead you would need to write a Script Query. Below is what you are looking for:

POST <your_index_name>/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "script": {
            "script": {
              "lang": "painless", 
              "source": """
                  try{           
                    String temp = doc['testField'].value;
                    int a = Integer.parseInt(temp);
                    if(a instanceof Integer)
                      return true;
                  }catch(NumberFormatException e){
                    return false;
                  }
              """
            }
          }
        }
      ]
    }
  }
}

Hope it helps!

Upvotes: 1

Related Questions