Reputation: 2526
The icu4x icu_segmenter::WordSegmenter
seems like the best word segmenter out there.
I don't understand how data providers work with word segmentation at all. It seems very complicated to me and I couldn't find any example.
I need it for Thai. I guess it uses the LSTM segmenter by default. It's better than anything I've seen before by default. It still has trouble with a lot of exotic names. Which is why I'd like to add my dictionary to it for personal use.
How to do that?
Upvotes: 3
Views: 76