user1544198
user1544198

Reputation: 1

How to get a spell corrected token from the 'binary file' in Apache OpenNLP?

Apache NLP , Can I get the right token from the binary file.

If the input is "hosr road" which is typo error and correct word is "hosur road" , can I get the right word as "hosur road" as token after search in binary file.

    String input = "hosr road";
    InputStream tokenModelIn = getClass().getClassLoader().getResourceAsStream("META-INF/nlp/en-token.bin");
    TokenizerModel tokenModel = new TokenizerModel(tokenModelIn);
    Tokenizer tokenizer = new TokenizerME(tokenModel);
    String tokens[] = tokenizer.tokenize(input);

Thanks in advance.

Upvotes: 0

Views: 915

Answers (1)

MWiesner
MWiesner

Reputation: 9068

Short answer: No, you can't.

The OpenNLP language models are not a dictionary to correct spelling in a given language. Moreover, "tokenization" is not the same as "spell correction". Tokens just represent the fragments of a sentence, so tokenization - as a natural language processing step - just gives you these fragments, even if they are misspelled. It won't correct those.

You could try another API/Framework for if you would like to have spell correction done with some text data. Maybe have a look at Lucene and this StackOverflow post.

Upvotes: 1

Related Questions