Reputation: 571
I have worked with Lucene for indexing documents and providing search among them, however, my work was in English language, but now, I have a project which is Kurdish language, Kurdish language uses some Arabic unicode characters and several other characters, here is Table of Unicode Characters used in Kurdish-Arabic script
My question is how to create Analyzer for this language, or can I use Arabic Analyzer for this purpose?
Upvotes: 3
Views: 694
Reputation: 3683
To answer your question about howto create a custom Analyzer for a new language..."Lucene In Action" book covers the creation of custom analyzers and it is pretty detailed. You can "leverage" a lot of the code found in other analyzers and just change what you need. Lucene is open source and very extensible, therefore profiling these changes is pretty easy.
Upvotes: 1
Reputation: 26703
Lucene has a list of other analyzers, including Arabic. I'm afraid there's no one which targets specifically Kurdish, but maybe you can extend Arabic analyzer to fit your needs?
Just bear in mind that all these analyzers come separately from the main Lucene distribution.
Upvotes: 1