Reputation: 110083
I have the following filepath I need to save to ES:
/mnt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg
I would like to be able to search the following and get a match:
search = "qf episodic sg_1001 JPG"
And get a match, in other words, it would be a search such as the following in (my)sql:
select * from table where fp like '%qf%' and fp like '%episodic%'
and fp like '%sg_1001%' and fp like '%jpg%'
Two questions here:
What would be the proper way to store this in my index? Current I have the very basic (and incorrect) keyword field --
body = {
"mappings": {
"_doc": {
"dynamic": "strict",
"properties": {
"path": {"type": "keyword"},
}
}
}
}
What would be the correct way to search the above in ES? Current I have --
"query": {
"bool": {
"must": [
{ "match": { "fp": "qf" } },
{ "match": { "fp": "episodic" } },
{ "match": { "fp": "sg_1001" } },
{ "match": { "fp": "JPG" } }
]
}
}
Upvotes: 0
Views: 91
Reputation: 8840
Let's say your input is this:
/mnt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg
What I am going to do is convert all this forward slash
and underscore
into whitespaces
So effectively your input would be looking now as
mnt qfs-X Asset_Management XG Marketing Episodic-SG 1001 1233.jpg
Using the standard
tokenizer along with token_filter(standard and lowercase)
below would be the list of words you'd finally have which would be stored in your inverted index eventually which could be queried.
mnt qfs X asset management xg marketing episodic sg 1001 1233 jpg
Below is the sample mapping and query for the above:
PUT mysampleindex
{
"settings":{
"analysis":{
"analyzer":{
"my_analyzer":{
"tokenizer":"standard",
"char_filter":[
"my_char_filter"
],
"filter":[
"standard",
"lowercase"
]
}
},
"char_filter":{
"my_char_filter":{
"type":"pattern_replace",
"pattern":"\\/|_",
"replacement":" "
}
}
}
},
"mappings":{
"mydocs":{
"properties":{
"mytext":{
"type":"text",
"analyzer":"my_analyzer"
}
}
}
}
}
POST mysampleindex/mydocs/1
{
"mytext": "nt/qfs-X/Asset_Management/XG_Marketing_/Episodic-SG_1001_1233.jpg"
}
POST mysampleindex/_search
{
"query":{
"match":{
"mytext":"qfs episodic sg 1001 jpg"
}
}
}
Keep in mind that when you send the above query to Elasticsearch, Elasticsearch would take the input and apply the Search Time Analysis there as well. I'd suggest you to read this link for more information on this and its the reason why you would get the document even with the below query string.
"mytext": "QFS EPISODIC SG 1001 jpg"
Now if you try to search using pisodic
(episodic) i.e below query as an example, the search wouldn't return anything, coz your inverted index doesn't save the words in that fashion. For such scenarios I'd suggest you to make use of N-Gram Tokenizer so that episodic
would be further create words like episodi, pisodic
which would be stored in inverted index.
POST mysampleindex/_search
{
"query":{
"match":{
"mytext":"pisodic"
}
}
}
Also note that I have been making use of text
and not keyword
datatype.
I hope this helps!
Upvotes: 1