Elasticsearch match string with spaces, columns, dashes exactly

Question

I'm using Elasticsearch 6.8, and trying to write a query in python notebook. Here is a mapping used for the index i'm working with:

{ "mapping": { "news": { "properties": { "dateCreated": { "type": "date", "format": "yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||epoch_millis" }, "itemId": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "market": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "timeWindow": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } } }

I'm trying to search for exact string like "[2020-08-16 10:00:00.0,2020-08-16 11:00:00.0]" in "timeWindow" field (which is a "text" type, not a "date" field), and also select by market="en-us" (market is a "text" field too). This string has spaces,colons,commas, a lot of whitecharacters, and I don't know how to make a right query.

At the moment I have this query:

res = es.search(index='my_index', 
    doc_type='news', 
    body={
    'size': size,
    'query':{
        "bool":{
            "must":[{
                "simple_query_string": {
                    "query": "[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]",
                    "default_operator": "and",
                    "minimum_should_match":"100%"
                }
            },
            {"match":{"market":"en-us"}}
            ]
        }  
    }
})

The problem is that is doesn't match my "simple_query_string" for timeWindow string exactly (I understand that this string gets tokenized, splitted into parts like "2020","08","17","00","01", etc, and each token is analyzed separately), and I'm getting different values for timeWindow that I want to exclude, like

['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
 '[2020-08-17 00:05:00.0,2020-08-17 01:05:00.0]'
 ...
 '[2020-08-17 00:50:00.0,2020-08-17 01:50:00.0]'
 '[2020-08-17 00:55:00.0,2020-08-17 01:55:00.0]'
 '[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]']

Is there a way to do what I want?

UPD (and answer): My current query uses "term" and "timeWindow.keyword", this combination allows me to do exact search for string with spaces and other whitecharacters:

res = es.search(index='msn_click_events', doc_type='news', body={
    'size': size,
    'query':{
            "bool":{
                "must":[{
                    "term": {
                        "timeWindow.keyword": tw
                    }
                },
                {"match":{"market":"en-us"}}
                ]
            }  
        }
    })

And this query selects only right timewindows values (string):

['[2020-08-17 00:00:00.0,2020-08-17 01:00:00.0]'
 '[2020-08-17 01:00:00.0,2020-08-17 02:00:00.0]'
 '[2020-08-17 02:00:00.0,2020-08-17 03:00:00.0]'
 ...
 '[2020-08-17 22:00:00.0,2020-08-17 23:00:00.0]'
 '[2020-08-17 23:00:00.0,2020-08-18 00:00:00.0]']

Amit · Accepted Answer

On your timeWindow field you need a keyword aka exact search but you are using the full-text query and as you defined this field as text field and you already guessed it correct, it gets analyzed during the index time, hence you are not getting the correct results.

If you are using the dynamic mapping, then .keyword field would be generated for each text field in the mapping, so you can simply use timeWindow.keyword in your query and it will work.

If you have defined your mapping than you need to add the keyword field to store the timewindow, reindex the data and use that keyword field in query to get the expected results.

Elasticsearch match string with spaces, columns, dashes exactly

Answers (1)

Related Questions