Reputation: 1
Apache NLP , Can I get the right token from the binary file.
If the input is "hosr road" which is typo error and correct word is "hosur road" , can I get the right word as "hosur road" as token after search in binary file.
String input = "hosr road";
InputStream tokenModelIn = getClass().getClassLoader().getResourceAsStream("META-INF/nlp/en-token.bin");
TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
Tokenizer tokenizer = new TokenizerME(tokenModel);
String tokens[] = tokenizer.tokenize(input);
Thanks in advance.
Upvotes: 0
Views: 915
Reputation: 9068
Short answer: No, you can't.
The OpenNLP language models are not a dictionary to correct spelling in a given language. Moreover, "tokenization" is not the same as "spell correction". Tokens just represent the fragments of a sentence, so tokenization - as a natural language processing step - just gives you these fragments, even if they are misspelled. It won't correct those.
You could try another API/Framework for if you would like to have spell correction done with some text data. Maybe have a look at Lucene and this StackOverflow post.
Upvotes: 1