ddayan
ddayan

Reputation: 4142

Regex in JAVA at most one dot

I expect: \b([a-zA-Z]+\.?)\b or \b([a-zA-Z]+\.{0,1})\b to work as at least one letter and at most one dot.

But the matcher finds "ab" with an input of "ab" "ab." and "ab.." and I'm expecting it to do the following:

"ab" is found for input "ab"
"ab." is found for input "ab."
nothing is found for input "ab.."

If I replace the regex to work with 0 instead of a dot e.g. \b([a-zA-Z]+0?)\b than it works as expected:

"ab" is found for input "ab"
"ab0" is found for input "ab0"
nothing is found for input "ab00"

So, how do I get my regex to work?

Upvotes: 1

Views: 1645

Answers (2)

Daniel Martin
Daniel Martin

Reputation: 23548

The issue is that \b matches between word characters and non-word characters, not between whitespace and non-whitespace as you seem to be trying. The difference between a . and a 0 is that 0 is considered a "word" character, but . isn't.

So what's happening in your examples is this:

Let's take that last string ab.. and see where \b could match:

   a b . .
  ^ x ^ x x

Remember, \b matches between characters. I've shown where \b could match with a ^, and where it can't with an x. Since \b can only match in front of a or right after b, we're limited to just matching ab so long as you have those \b bits in there.

I think you want something like \bab\.?(?!\S). That says "word boundary, then a then b then maybe a single dot where there is NOT a non-space character immediately after."

If I've misunderstood your question, and you do want the expression to find ab. in the string ab.c or find ab in abc you can do \bab\.?(?!\.)

Upvotes: 5

MarcoS
MarcoS

Reputation: 13564

  • \b([a-zA-Z]+\.+)\b is "at least one letter followed by at least one dot
  • \b([a-zA-Z]+\.{0,1})\b is "at least one letter followed by zero or one dot

Upvotes: 0

Related Questions