devsda
devsda

Reputation: 4222

parse text into valid sentence

I have a doubt about how to parse any text into valid sentence.

Suppose a text is given iamjhamb and parse into i am jhamb

My approach: I solved this using Dynamic programmnig, 
             Make an array T[], where T[i] shows string from 0 to i made any valid setence or not
             formula is T[i] = 1 iff T[j] = 1 and substring(j+1, i) is a word in dictionary for all
             j < i.

But this approach is not totally correct, it gives all possible words form from this text, as this is not the demand of this questioin. So, please help me to correct this approach, or suggest any other good approach.

I have one more doubt, i searched a lot on net about Suffix array, but I didn't get any good tutorial. So make me understand that concept, or suggest any good link. Thanks in advance.

Upvotes: 0

Views: 200

Answers (2)

Qnan
Qnan

Reputation: 3744

This problem is known as the word segmentation problem in natural language processing. While this problem rarely arises for English, it is quite common for Arabic or Chinese. You could review the literature on the subject and consider adapting one of the methods to your case.

As for your algorithm, the simplest thing to do would be to enumerate the possible segmentations it produces and select one using a language model. I think a bigram model might suffice for simple sentences.

Suffix tree would allow you to find the possible segmentations more efficiently, but would not help identifying the most likely one, unless you go for a language model based on suffix trees.

Upvotes: 1

noMAD
noMAD

Reputation: 7844

Have you tried constructing a trie for the String? Read about them here. It will work except for cases where there are multiple choices to choose from. Example: aneat can be a neat or an eat.

Upvotes: 0

Related Questions