harsh
harsh

Reputation: 21

Why . is getting excluded in word boundary in regex

I have the following regex:

\b[_\.][0-9]{1,}[a-zA-Z]{0,}[_]{0,}\b

My input string is:

  1. _49791626567342fYbYzeRESzHsQUgwjimkIfW
  2. .49791626567342fYbYzeRESzHsQUgwjimkIfW

I would assume that it matches 1. and 2., but it is only matching in the first scenario. Can you help me find the mistake in the regex?

Upvotes: 0

Views: 46

Answers (1)

Sebastian Proske
Sebastian Proske

Reputation: 8413

A word boundary is a border between a word character (letters, digits, underscore) and either a non-word-character or the start or end of the string. So there simply is no word boundary between dot (non-word-character) and the start of the string.

You can use an anchor in this case, to signal the start of the string, like

^[_\.][0-9]{1,}[a-zA-Z]{0,}[_]{0,}$

You can also shorten your regex a bit by using * and + quantifiers and avoiding unnecessary escape sequences, as suggested by Toto

^[_.][0-9]+[a-zA-Z]*_*$

You can also use lookahead and lookbehind (if available) to build yourself a custom boundary.

Upvotes: 3

Related Questions