Reputation: 3382
Which lucene analyzer can be used to handle Japanese text properly? It should be able to handle Kanji, Hiragana, Katakana, Romaji, and any of their combination.
Upvotes: 8
Views: 3489
Reputation: 12678
I found lucene-gosen while doing a search for my own purposes:
Their example looks fairly decent, but I guess it's the kind of thing that needs extensive testing. I'm also worried about their backwards-compatibility policy (or rather, the complete lack of one.)
Upvotes: 3
Reputation: 82994
You should probably look at the CJK package that is in the contrib area of Lucene. There is an analyzer and a tokenizer specifically for dealing with Chinese, Japanese, and Korean.
Upvotes: 4