elasticsearchsearchelastic-stackelasticsearch-5

Reputation: 905

Filter elastic search data when fields contain ~

I have bunch of documents like below. I want to filter the data where projectkey starts with ~. I did read some articles which says ~ is an operator in Elastic query so cannot really filter with that. Can someone help to form the search query for /branch/_search API ??

{
  "_index": "branch",
  "_type": "_doc",
  "_id": "GAz-inQBJWWbwa_v-l9e",
  "_version": 1,
  "_score": null,
  "_source": {
    "branchID": "refs/heads/feature/12345",
    "displayID": "feature/12345",
    "date": "2020-09-14T05:03:20.137Z",
    "projectKey": "~user",
    "repoKey": "deploy",
    "isDefaultBranch": false,
    "eventStatus": "CREATED",
    "user": "user"
  },
  "fields": {
    "date": [
      "2020-09-14T05:03:20.137Z"
    ]
  },
  "highlight": {
    "projectKey": [
      "~@kibana-highlighted-field@user@/kibana-highlighted-field@"
    ],
    "projectKey.keyword": [
      "@kibana-highlighted-field@~user@/kibana-highlighted-field@"
    ],
    "user": [
      "@kibana-highlighted-field@user@/kibana-highlighted-field@"
    ]
  },
  "sort": [
    1600059800137
  ]
}

UPDATE***

I used prerana's answer below to use -prefix in my query

Something is still wrong when i use prefix and range - i get below error - What am i missing ??

GET /branch/_search
{
  "query": {
    "prefix": {
      "projectKey": "~"
    },
    "range": {
      "date": {
        "gte": "2020-09-14",
        "lte": "2020-09-14"
      }
    }
  }
}



    {
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
        "line": 6,
        "col": 5
      }
    ],
    "type": "parsing_exception",
    "reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
    "line": 6,
    "col": 5
  },
  "status": 400
}

Upvotes: 0

Answers (2)

user11935734

Reputation:

while @hansley answer would work, but it requires you to create a custom analyzer and still as you mentioned you want to get only the docs which starts with ~ but in his result I see all the docs containing ~, so providing my answer which requires very less configuration and works as required.

Index mapping default, so just index below docs and ES will create a default mapping with .keyword field for all text field

Index sample docs

{
    "title" : "content1 ~"
}

{
    "title" : "~ staring with"
}

{
    "title" : "in between ~ with"
}

Search query should fetch obly 2nd docs from sample docs

{
  "query": {
    "prefix" : { "title.keyword" : "~" }
  }
}

And search result

"hits": [
            {
                "_index": "pre",
                "_type": "_doc",
                "_id": "2",
                "_score": 1.0,
                "_source": {
                    "title": "~ staring with"
                }
            }
        ]

Please refer prefix query for more info

Update 1:

Index Mapping:

{
  "mappings": {
    "properties": {
      "date": {
        "type": "date" 
      }
    }
  }
}

Index Data:

{
    "date": "2015-02-01",
    "title" : "in between ~ with"
}
{
    "date": "2015-01-01",
    "title": "content1 ~"
}
{
    "date": "2015-02-01",
     "title" : "~ staring with"
}
{
    "date": "2015-02-01",
    "title" : "~ in between with"
}

Search Query:

{
    "query": {
        "bool": {
            "must": [
                {
                    "prefix": {
                        "title.keyword": "~"
                    }
                },
                {
                    "range": {
                        "date": {
                            "lte": "2015-02-05",
                            "gte": "2015-01-11"
                        }
                    }
                }
            ]
        }
    }
}

Search Result:

"hits": [
      {
        "_index": "stof_63924930",
        "_type": "_doc",
        "_id": "2",
        "_score": 2.0,
        "_source": {
          "date": "2015-02-01",
          "title": "~ staring with"
        }
      },
      {
        "_index": "stof_63924930",
        "_type": "_doc",
        "_id": "4",
        "_score": 2.0,
        "_source": {
          "date": "2015-02-01",
          "title": "~ in between with"
        }
      }
    ]

Upvotes: 1

hansley

Reputation: 290

If I understood your issue well, I suggest the creation of a custom analyzer to search the special character ~.

I did a test locally as follows while replacing ~ to __SPECIAL__ :

I created an index with a custom char_filter alongside with the addition of a field to the projectKey field. The name of the new multi_field is special_characters.

Here is the mapping:

PUT wildcard-index
{
"settings": {
    "analysis": {
    "char_filter": {
        "special-characters-replacement": {
        "type": "mapping",
        "mappings": [
            "~ => __SPECIAL__"
        ]
        }
    },
    "analyzer": {
        "special-characters-analyzer": {
        "tokenizer": "standard",
        "char_filter": [
            "special-characters-replacement"
        ]
        }
    }
    }
},
"mappings": {
    "properties": {
    "projectKey": {
        "type": "text",
        "fields": {
        "special_characters": {
            "type": "text",
            "analyzer": "special-characters-analyzer"
        }
        }
    }
    }
}
}

Then I ingested the following contents in the index:

"projectKey": "content1 ~"

"projectKey": "This ~ is a content"

"projectKey": "~ cars on the road"

"projectKey": "o ~ngram"

Then, the query was:

GET wildcard-index/_search
{
"query": {
    "match": {
    "projectKey.special_characters": "~"
    }
}
}

The response was:

"hits" : [
  {
    "_index" : "wildcard-index",
    "_type" : "_doc",
    "_id" : "h1hKmHQBowpsxTkFD9IR",
    "_score" : 0.43250346,
    "_source" : {
      "projectKey" : "content1 ~"
    }
  },
  {
    "_index" : "wildcard-index",
    "_type" : "_doc",
    "_id" : "iFhKmHQBowpsxTkFFNL5",
    "_score" : 0.3034693,
    "_source" : {
      "projectKey" : "This ~ is a content"
    }
  },
  {
    "_index" : "wildcard-index",
    "_type" : "_doc",
    "_id" : "-lhKmHQBowpsxTkFG9Kg",
    "_score" : 0.3034693,
    "_source" : {
      "projectKey" : "~ cars on the road"
    }
  }
]

Please let me know If you have any issue, I will be glad to help you.

Note: This method works if there is a blank space after the ~. You can see from the response that the 4th data was not displayed.

Upvotes: 2

Filter elastic search data when fields contain ~

Answers (2)

Related Questions