Deepak Goyal
Deepak Goyal

Reputation: 4907

Elastic Search analyzer to convert string field to an integer

I have the price in the data and this price is currency formatted price.

"price": "R$ 400,00"

My requirement is to find the data having price >= 200 for e.g. So I added the numeric_analyzer to the price field.

{
  "settings": {
    "analysis": {
      "filter": {
        "number_only": {
          "type": "pattern_replace",
          "pattern": "\\D"
        }
      },
      "analyzer": {
        "numeric_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "number_only"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "price": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          },
          "numeric": {
            "type": "text",
            "analyzer": "numeric_analyzer"
          }
        }
      }
    }
  }
}

This numeric_analyzer will remove all the non-digits. So if the price is "R$ 400,00" then price.numeric field will have the value 40000. Basically the price in cents. But this price will be as String type.

if I use range query on string type then the results will not be accurate. For e.g: Suppose price numeric field has the following prices:

- 12000 (R$ 120,00)
- 14000 (R$ 140,00)
- 130400 (R$ 1.304,00)

With the following range query:

{
  "range": {
    "price.numeric": {
      "gte": "12000",
      "lte": "14000"
    }
  }
}

I will get all the above prices because the prices are in String type.

Is there a way to convert this string type to integer? or any other way to achieve this?

Thanks

Upvotes: 0

Views: 1519

Answers (1)

caster
caster

Reputation: 186

you can use thie pipeline to insert your data to es:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "grok": {
          "field": "price",
          "patterns": [
            "%{WORD:w1}[$] %{NUMBER:w2:double}[,]%{NUMBER:w3:int}"
          ]
        }
      },
      {
        "script": {
          "source": "ctx.res = ctx.w2*1000+ctx.w3"
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "price": "R$ 140,00"
      }
    },
    {
      "_source": {
        "price": "R$ 120,00"
      }
    },
    {
      "_source": {
        "price": "R$ 1.304,00"
      }
    }
  ]
}

Upvotes: 1

Related Questions