Nathan
Nathan

Reputation: 1483

What type of Trie is this?

I want to add words an opensource Java word splitting program for Khmer (a language that does not have spaces between words). The developers have not worked on it in a long time, and I haven't been able to contact them for details (http://sourceforge.net/projects/khmer/files/Khmer%20Word%20Breaking/Khmer%20Word%20Breaking%20program%20V1.0/). Supposedly the list was created from a Khmer dictionary, and I would like to re-create the file to include more words.

Can anyone identify what format the word dictionary is in (I believe it is some type of Trie)? Here are the first few lines:

0ឳមអគណជយឍឫហកដពទឱលថឦឡញឩខនឧផប។ឋវឭឈឃឥឌឰឪសងចភធឯតឆរ
1ទ
0ក
1
1ីែមគួណជយ៍ៀហកទុលេញ៉ឺនំឹៃូឈឃោាឿសងចិ្ធើតៅរ
1គនសងរ
0ទ
0ា
0យ
0ព
0ន
1
1រ
0ា
0ស
0ី
1

And does anyone know how I would go about making a new one (I have a large wordlist, but I am not sure how to get it into this format).

Thanks!

Upvotes: 0

Views: 160

Answers (1)

ideally_world
ideally_world

Reputation: 436

After a quick look through the code, I have a theory.

Create a SearchTree which extends TreeItem. For each word in your dictionary, call addWord from TreeItem. When the iteration is done, call export on SearchTree. Use new file as the word input file.

Additionally, there may be an undocumented parameter for khwrdbrk.jar, --create, that will read the words for the new tree from standard input.

Again, just a theory, but let me know what happens if you test it out.

Upvotes: 1

Related Questions