darksigma
darksigma

Reputation: 313

Buggy Regexp Query in ElasticSearch

I've inserted a document with a raw_id field equal to 1.2.3.04ABC, and I'm trying to construct a regular expression query to search the document in ES. I'm using the following query:

curl -X POST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "raw_id": "1\\.2\\.3\\.04ABC" 
        }
    }
}' 

This returns the result empty result

{
    "took":1,
    "timed_out":false,
    "_shards": {
        "total":5,
        "successful":5,
        "failed":0
    },
    "hits": {
        "total":0,
        "max_score":null,
        "hits":[]
    }
}

On the other hand, when I use the slightly modified query

curl -X POST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "raw_id": "1\\.2\\.3.*" 
        }
    }
}' 

I get the nonempty result:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "adfafadfafa",
                "_index": "hello",
                "_score": 1.0,
                "_source": {
                    "raw_id": "1.2.3.04ABC"
                },
                "_type": "world"
            }
        ],
        "max_score": 1.0,
        "total": 1
    },
    "timed_out": false,
    "took": 2
}

Can someone please help me understand why the first query doesn't work?

Upvotes: 1

Views: 379

Answers (1)

Val
Val

Reputation: 217274

My guess is that your raw_id field is an analyzed string, while it should be not_analyzed. I've used the following mapping with one analyzed string field id and another not_analyzed string field raw_id:

curl -XPUT 'http://localhost:9200/hello' -d '{
  "mappings": {
    "world": {
      "properties": {
        "id": {
          "type": "string"
        },
        "raw_id": {
          "type": "string",
          "index": "not_analyzed"
        }
      }
    }
  }
}'

Then I've indexed the following document:

curl -XPUT 'http://localhost:9200/hello/world/1' -d '{
  "id": "1.2.3.04ABC",
  "raw_id": "1.2.3.04ABC"
}'

Now taking your query above, if I search against the id field, I get no hits:

curl -XPOST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "id": "1\\.2\\.3\\.04ABC" 
        }
    }
}'
=> 0 hits KO

However, I do get one hit when I search against the raw_id field:

curl -XPOST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "raw_id": "1\\.2\\.3\\.04ABC" 
        }
    }
}'
=> 1 hit OK

With your second query I get a hit with each field:

curl -XPOST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "id": "1\\.2\\.3.*" 
        }
    }
}'
=> 1 hit OK

curl -XPOST 'http://localhost:9200/hello/world/_search' -d '{
"query": {
    "regexp": {
        "raw_id": "1\\.2\\.3.*" 
        }
    }
}'
=> 1 hit OK

Upvotes: 1

Related Questions