Abhishek Sachan
Abhishek Sachan

Reputation: 995

Search for multiple values of same field in elasticsearch

I have a schema like this

[{'author': 'edsec',
'awesomeness': 3,
'date': '2017-09-12T07:22:50.033712',
'url': 'http://nakedsecurity.sophos.com/2016/02/11/'},
{'author': '.thea',
'awesomeness': 2,
'date': '2017-09-12T08:22:49.969594',
'url': 'http://www.theage.com.au/victoria/'},
{'author': '.chic',
'awesomeness': 1,
'date': '2017-09-12T09:22:49.896584',
 'url': 'http://www.chicagotribune.com/news/'},
{'author': '://ww',
'awesomeness': 1,
'date': '2017-09-12T10:19:58.723068',
'url': 'https://www.theage.com.au/victoria/'},
{'author': '://ww',
'awesomeness': 0,
'date': '2017-09-12T11:19:58.656548',
'url': 'https://www.networkworld.com/article/3028099/security/'},
{'author': '://av',
'awesomeness': 0,
'date': '2017-09-12T12:19:57.589412',
'url': 'https://avien.net/blog/educational-ransomware/'}]

Now i want to query on url to find both the occurance of url with either http or https.

As for url http://www.theage.com.au/victoria/ both http and https version are saved that i want to discard.

I searched a bit and wrote query but its not giving adequate results.

result = es.search(index='blogs', doc_type='text',  
                       body={
                           "size": 10,
                           "query": {"bool":{
                                  "should":[
                                  {"term": {"url": final_url}},
                                  {"term": {"url": url}}],
                                  "minimum_should_match" : 1,
                                  "boost" : 1.0
                           } }


                           }

                      )

In this

url = http://www.networkworld.com/article/3028099/security/ final_url = https://www.networkworld.com/article/3028099/security/

I am getting empty and no results are matching i should get one of them.

Upvotes: 2

Views: 5699

Answers (2)

Abhishek Sachan
Abhishek Sachan

Reputation: 995

Got the answer myself

    result = es.search(index=self.es_index, doc_type='abc',
                       body={"query": {"bool":{
                                  "must":[
                                  {"match": {"url": url}},
                                  {"match": {"url": url2}}],

                           } }})

Upvotes: 4

Lucky Sharma
Lucky Sharma

Reputation: 173

try this query

If the data field URL is analyzed then this would work:

{
   "query": {
         "query_string": {
                    "query": "url: (http OR https) "
            }
   }
}

Also while using slashes in the query make sure you escape them.

Upvotes: 3

Related Questions