Reputation: 62746
This question is a continuation of Java string searching ignoring accents.
The answer to the original question shows us how to remove the diacritics from strings. So, for instance, köln becomes koln. But łódź becomes łodz - note the l with stroke.
My question is how can I remove the stroke as well, so that łódź becomes lodz?
Thanks.
Upvotes: 3
Views: 1226
Reputation: 51
As tchrist suggested, I attempted to use ICU (V 50.1): it didn't recognize it as derived from L either. The L with stroke seems to be a special case in Unicode. Look at http://bugs.mysql.com/bug.php?id=11369 They say in Unicode 4.0 it was not connected to L, while in Unicode 4.1 it is. I wonder if anyone tested the problem with a Unicode4.1-based Java library.
Upvotes: 1
Reputation: 354506
You cannot, at least not trivially for all such letters. The letter ł
is (except for appearance and its Unicode name) not linked to l
at all (in Unicode at least; linguistically that's a different matter).
Your only option might be a conversion table for your use case you can fill with all the characters you need to convert.
Upvotes: 2