mash
mash

Reputation: 2526

How to combine icu4x word segmenter with additional dictionary

The icu4x icu_segmenter::WordSegmenter seems like the best word segmenter out there.

I don't understand how data providers work with word segmentation at all. It seems very complicated to me and I couldn't find any example.

I need it for Thai. I guess it uses the LSTM segmenter by default. It's better than anything I've seen before by default. It still has trouble with a lot of exotic names. Which is why I'd like to add my dictionary to it for personal use.

How to do that?

Upvotes: 3

Views: 76

Answers (0)

Related Questions