Faiz Kidwai
Faiz Kidwai

Reputation: 475

Check if string contains only Unicode values [\u0030-\u0039] or [\u0660-\u0669]

I need to check, in java, if a string is composed only of Unicode values [\u0030-\u0039] or [\u0660-\u0669]. What is the most efficient way of doing this?

Upvotes: 2

Views: 2694

Answers (5)

tentacle
tentacle

Reputation: 553

Since these codepoints represent numerals in two different unicode blocks, I suggest to check if respective character is a numeral:

boolean isNumerals(String s) {
    return !s.chars().anyMatch(v -> !Character.isDigit(v));
}

This will definitely match more than asked for, but in some cases or in more controlled environment it may be useful to make code more readable.

(edit)

Java API also allows to determine a unicode block of a specific character:

Character.UnicodeBlock arabic = Character.UnicodeBlock.ARABIC;
Character.UnicodeBlock latin = Character.UnicodeBlock.BASIC_LATIN;

boolean isValidBlock(String s) {
    return s.chars().allMatch(v ->
            Character.UnicodeBlock.of(v).equals(arabic) ||
                    Character.UnicodeBlock.of(v).equals(latin)

    );
}

Combined with the check above will give exact result OP has asked for. On the plus side - higher abstraction gives more flexibility, makes code more readable and is not dependent on exact encoding of string passed.

Upvotes: 2

Superluminal
Superluminal

Reputation: 977

Use \x for unicode characters:

^([\x{0030}-\x{0039}\x{0660}-\x{0669}]+)$

if the patternt should match an empty string too, use * instead of +

Use this if you dont want to allows mixing characters from both sets you provided:

^([\x{0030}-\x{0039}]+|[\x{0660}-\x{0669}]+)$

https://regex101.com/r/xqWL4q/6

As mentioned by Holger in comments below. \x{0030}-\x{0039} is equivalent with [0-9]. So could be substituted and would be more readable.

Upvotes: 5

Holger
Holger

Reputation: 298233

As said here, it’s not clear whether you want to check for probably mixed occurrences of these digits or check for either of these ranges.

A simple check for mixed digits would be string.matches("[0-9٠-٩]*") or to avoid confusing changes of the read/write direction, or if your source code encoding doesn’t support all characters, string.matches("[0-9\u0660-\u669]*").

Checking whether the string matches either range, can be done using
string.matches("[0-9]*")||string.matches("[٠-٩]*") or
string.matches("[0-9]*")||string.matches("[\u0660-\u669]*").

An alternative would be
string.chars().allMatch(c -> c >= '0' && c <= '9' || c >= '٠' && c <= '٩').
Or to check for either, string.chars().allMatch(c -> c >= '0' && c <= '9') || string.chars().allMatch(c -> c >= '٠' && c <= '٩')

Upvotes: 4

stonar96
stonar96

Reputation: 1460

Here is a solution which works without regex for arbitrary unicode code points (outside of the Basic Multilingual Plane).

private final Set<Integer> codePoints = new HashSet<Integer>();

public boolean test(String string) {
    for (int i = 0, codePoint = 0; i < string.length(); i += Character.charCount(codePoint)) {
        codePoint = string.codePointAt(i);

        if (!codePoints.contains(codePoint)) {
            return false;
        }
    }

    return true;
}

Upvotes: 0

simple solution by using regex: (see also lot better explained by @Predicate https://stackoverflow.com/a/60597367/12558456)

private boolean legalRegex(String s) {
    return s.matches("^([\u0030-\u0039]|[\u0660-\u0669])*$");
}

faster but ugly solution: (needs a hashset of allowed chars)

private boolean legalCharactersOnly(String s) {
        for (char c:s.toCharArray()) {
            if (!allowedCharacters.contains(c)) {
                return false;
            }
        }
        return true;
    }

Upvotes: 1

Related Questions