Reputation: 995
I have a schema like this
[{'author': 'edsec',
'awesomeness': 3,
'date': '2017-09-12T07:22:50.033712',
'url': 'http://nakedsecurity.sophos.com/2016/02/11/'},
{'author': '.thea',
'awesomeness': 2,
'date': '2017-09-12T08:22:49.969594',
'url': 'http://www.theage.com.au/victoria/'},
{'author': '.chic',
'awesomeness': 1,
'date': '2017-09-12T09:22:49.896584',
'url': 'http://www.chicagotribune.com/news/'},
{'author': '://ww',
'awesomeness': 1,
'date': '2017-09-12T10:19:58.723068',
'url': 'https://www.theage.com.au/victoria/'},
{'author': '://ww',
'awesomeness': 0,
'date': '2017-09-12T11:19:58.656548',
'url': 'https://www.networkworld.com/article/3028099/security/'},
{'author': '://av',
'awesomeness': 0,
'date': '2017-09-12T12:19:57.589412',
'url': 'https://avien.net/blog/educational-ransomware/'}]
Now i want to query on url to find both the occurance of url with either http or https.
As for url http://www.theage.com.au/victoria/ both http and https version are saved that i want to discard.
I searched a bit and wrote query but its not giving adequate results.
result = es.search(index='blogs', doc_type='text',
body={
"size": 10,
"query": {"bool":{
"should":[
{"term": {"url": final_url}},
{"term": {"url": url}}],
"minimum_should_match" : 1,
"boost" : 1.0
} }
}
)
In this
url = http://www.networkworld.com/article/3028099/security/ final_url = https://www.networkworld.com/article/3028099/security/
I am getting empty and no results are matching i should get one of them.
Upvotes: 2
Views: 5699
Reputation: 995
Got the answer myself
result = es.search(index=self.es_index, doc_type='abc',
body={"query": {"bool":{
"must":[
{"match": {"url": url}},
{"match": {"url": url2}}],
} }})
Upvotes: 4
Reputation: 173
try this query
If the data field URL is analyzed then this would work:
{
"query": {
"query_string": {
"query": "url: (http OR https) "
}
}
}
Also while using slashes in the query make sure you escape them.
Upvotes: 3