Bertuz
Bertuz

Reputation: 2566

how to search a document containing a substring

I have the following document with this (partial) mapping:

  "message": {
    "type": "text",
    "fields": {
      "keyword": {
        "type": "keyword",
        "ignore_above": 256
      }
    }
  },

I'm trying to perform a query for document containing "success":"0" through the following DSL query:

{
  "query": {
    "bool": {
      "must": {
        "regexp": {
          "message": ".*\"success\".*0.*"
        }
      }
    }
  }
}

but I don't get any result, whereas if I perform the following DSL:

{
  "query": {
    "bool": {
      "must": {
        "regexp": {
          "message": ".*\"success\""
        }
      }
    }
  }
}

I'm returned some document! I.e.

{"data":"[{\"appVersion\":\"1.1.1\",\"installationId\":\"any-ubst-id\",\"platform\":\"aaa\",\"brand\":\"Dalvik\",\"screenSize\":\"xhdpi\"}]","executionTime":"0","flags":"0","method":"aaa","service":"myService","success":"0","type":"aservice","version":"1"}

What's wrong with my query?

Upvotes: 0

Views: 201

Answers (1)

Ashraful Islam
Ashraful Islam

Reputation: 12840

The text field message uses standard analyzer which tokenize the input string and convert it to tokens.

If we analyze the string "success":"0" using standard analyzer we will get these tokens

{
  "tokens": [
    {
      "token": "success",
      "start_offset": 2,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "0",
      "start_offset": 12,
      "end_offset": 13,
      "type": "<NUM>",
      "position": 1
    }
  ]
}

So you can see that colon double quotes etc are removed. And since regexp query applied on each token it will not match your query.

But if we use message.keyword which has field type keyword. it is not analyzed thus keep the string as it is.

{
  "tokens": [
    {
      "token": """ "success":"0" """,
      "start_offset": 0,
      "end_offset": 15,
      "type": "word",
      "position": 0
    }
  ]
}

So if we use the below query it should work

{
  "query": {
    "regexp": {
      "message.keyword": """.*"success".*0.*"""
    }
  }
}

But another problem is you have set message.keyword field settings to "ignore_above": 256 So This field will ignore any string longer than 256 characters.

Upvotes: 1

Related Questions