arsa
arsa

Reputation: 59

How to support internationalization for String validation?

How to support internationalization for String validation?

In my program I had a regex which ensures an input string has at least one alpha and one numeric character and the length is in between 2 to 10.

Pattern p = Pattern.compile("^(?=.\d)(?=.[A-Za-z])[A-Za-z0-9]{2,10}$");

Per new requirement, it need to support internationalization. How can it be done?

To support internationalization for the messages, I have used resource bundle, properties file using translated hard coded text. But not sure it is achieve to validate string.

Upvotes: 4

Views: 942

Answers (2)

stema
stema

Reputation: 92986

What you need is Unicode!

Unicode code properites

Pattern p = Pattern.compile("^(?=.*\p{Nd})(?=.*\p{L})[\p{L}\p{Nd}]{2,10}$");

\p{L} and \p{Nd} are Unicode properties, where

\p{L} is any kind of letter from any language

\p{Nd} is a digit zero through nine in any script except ideographic scripts

For more details on Unicode properties see regular-expressions.info

Pattern.UNICODE_CHARACTER_CLASS

There is also a new property Pattern.UNICODE_CHARACTER_CLASS that enables the Unicode version of the predefined character classes see my answer here for some more details and links

You could do something like this

Pattern p = Pattern.compile("^(?=.*\\d)(?=.*[A-Za-z])\\w{2,10}$", Pattern.UNICODE_CHARACTER_CLASS);

and \w would match all letters and all digits from any languages (and of course some word combining characters like _).

Error in your regex

I also changed your regex a bit. Your original lookaheads ((?=.\d)(?=.[A-Za-z])) would check for the second character being a letter and a digit, what is failing in all ways, my version with the quantifier checks for if they are anywhere in the string.

Upvotes: 4

Zarkonnen
Zarkonnen

Reputation: 22478

At this point it might be better to define which characters (if any) don't count as alpha characters (like spaces, etc?). Then just make it "at least one numeric and one non-numeric character". But I think the problems you're having with the requirement stem from it being a bit silly.

Is this for a password? Two-character passwords are completely not secure. Some people may want to use passwords longer than ten characters. Is there actually any reason not to allow much longer passwords?

http://xkcd.com/936/ gives a pretty good overview of what makes an actually strong password. Requiring numbers doesn't help much against a modern attacker but makes the user's life harder. Better require a long password.

Upvotes: 0

Related Questions