Reputation: 4222
I have a doubt about how to parse any text into valid sentence.
Suppose a text is given iamjhamb
and parse into i am jhamb
My approach: I solved this using Dynamic programmnig,
Make an array T[], where T[i] shows string from 0 to i made any valid setence or not
formula is T[i] = 1 iff T[j] = 1 and substring(j+1, i) is a word in dictionary for all
j < i.
But this approach is not totally correct, it gives all possible words form from this text, as this is not the demand of this questioin. So, please help me to correct this approach, or suggest any other good approach.
I have one more doubt, i searched a lot on net about Suffix array, but I didn't get any good tutorial. So make me understand that concept, or suggest any good link. Thanks in advance.
Upvotes: 0
Views: 200
Reputation: 3744
This problem is known as the word segmentation problem in natural language processing. While this problem rarely arises for English, it is quite common for Arabic or Chinese. You could review the literature on the subject and consider adapting one of the methods to your case.
As for your algorithm, the simplest thing to do would be to enumerate the possible segmentations it produces and select one using a language model. I think a bigram model might suffice for simple sentences.
Suffix tree would allow you to find the possible segmentations more efficiently, but would not help identifying the most likely one, unless you go for a language model based on suffix trees.
Upvotes: 1