Reputation: 13
I have been trying to figure out the way to split a string into words using elastic search, i have tried using the word_delimiter but it only seems to work if the string is already split for example "this-is-a-string"
However, my goal is to split strings into words like these examples:
"redcar" => "Red Car"
"greatholiday" => "Great Holiday"
"myhouseisred" => "My house is red"
What would the best option? Would i use a custom tokenizer?
Any help would be a huge relief, Thanks!
--- Use Case ---
@Elasticsearch Ninja
I have a database of documents, one of the columns contains strings specific to that document, however, Some of those strings contains English words and are not correctly formatted (There is no way for me to get a copy of already formatted data because the current format is the only way i can receive the data)
For example i have the following columns:
id | text | document_id
1 redcar 10844
2 cheaphouses 22418
3 notarealstring 9821
......
......
I want to use elastic search or maybe some other solution that can parse each "Text" field and separate the string based on common English words, Therefore the current documents would become:
Upvotes: 1
Views: 1881
Reputation: 32386
What you are trying to achieve is not possible using any tokenizer
, or custom-analyzer
in elasticsearch as you don't have a fixed pattern by which you are dividing your text and creating tokens.
But as mentioned earlier in the comments if you try to do this yourself it will not be the efficient and mostly the wrong way to do that and will be really difficult to cover all the use-cases you might have.
In-short, ES doesn't provide out of the box solution and you have to build these tokens in your application but it will not be efficient and performant.
Upvotes: 1