supertonsky
supertonsky

Reputation: 2743

Java RegEx match string containing non-ASCII that exceeds given length

How do I determine if the string contains non-ASCII AND exceeds 5 characters using RegEx?

I tried this pattern: (?=\P{ASCII})(?=^.{6,}$) I thought (?=) means (?=must be this)(?=and this too).

Given this input: 1巻345 I'm expecting matcher find() to return false.

Given this input: 1巻34567 I'm expecting matcher find() to return true.

But it always returns false on both inputs.

Please also explain why my given pattern doesn't work.

UPDATE: I figured the right pattern: (\P{ASCII})(.{6,})

Now I only need to know why (?=) doesn't work.

Upvotes: 1

Views: 1156

Answers (1)

HamZa
HamZa

Reputation: 14921

What you're looking for is:

^(?=.*\P{ASCII}).{6,}$

So let's explain it:

^                       # Begin of string
    (?=                 # Take a look and make sure if there is
        .*              # Anything zero or more times (greedy)
        \P{ASCII}       # A non-ascii character
    )                   # End of lookahead
    .{6,}               # Match any character 6 or more times
$                       # End of string

Let's analyse why your pattern fails (?=\P{ASCII})(?=^.{6,}$):

  1. (?=\P{ASCII}) you're first telling the regex engine to check if there is a non-ascii character.
  2. (?=^.{6,}$) then you're telling the regex engine to check if it's the beginning of string with ^ in the lookahead, and then checking if there is 6 or more characters.

Now look at your input, you've got 1巻34567. And you're telling the regex engine if the first character is non-ascii, which is false since the first character is 1. Try 巻345671 as input and it should output true.

Note that . doesn't match newline. So you might want to set the s modifier by using (?s):
(?s)^(?=.*\P{ASCII}).{6,}$.

Upvotes: 5

Related Questions