Reputation: 4288
i need a regex for all alphabets. I have an input and target text. Both of them can be belong different alphabets. I mean they can be belong chinese, latin, cyrillic and any others alphabet.
I need a regex for multi language input and multi language target text.
Is there anybody has any idea about this? How can i write this regex ?
I will use this with javascript. But i think there should be common regex for java and javascript also for this problem.
Upvotes: 1
Views: 1955
Reputation: 4288
i use "|" this character as a separator, so it is speacial for me. Key can be any character except of "|". it solve my problems thanks for answers. And it can be used with javascript, java and groovy. I tested it, worked.
var keyPrefix ="\\|[\u0000-\u007B\u007D-\uFFEF]*";
var keySuffix = "[\u0000-\u007B\u007D-\uFFEF]*\\|";
var searchkey = keyPrefix + key.toLowerCase() + keySuffix;
Upvotes: 0
Reputation: 92976
If you are in Java (not in javascript!) you can use unicode properties, e.g.
\P{L}
any kind of letter from any language.
See regular-expressions.info/unicode for more informations.
For Javascript:
There is a lib from XRegExp and some plugins XRegExp Unicode plugins that extends the javasript regex features. That adds support for Unicode categories, scripts, and blocks.
With those libs you would be able to use \p{L}
with javascript.
See my answer to this question for a small example
Upvotes: 4
Reputation: 56162
Some regex engines support special character for all Unicode letters:
\p{L}
Or you can use \w
- letter, digit, underscore
Upvotes: 2