Liang Zhou
Liang Zhou

Reputation: 2165

What is a good regex to match a word with optional space in it?

Digits are optional, and are only allowed in the end of a word

Spaces are optional, and are only allowed in the middle of a word.

I am pretty much just trying to match the possible months in a few languages, say English and Vietnamese

For example, the following are valid matches:

'June' 'tháng 6'

But the following are not because of space: 'June ' ' June'

This is my testcases: https://regex101.com/r/pZ0mN3/2.

As you can see, I came up with ^\S[\S ]+\S$ which is kind of working, but I wonder if there's a better way to do it.

Upvotes: 3

Views: 2429

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626747

To match a string with no leading and trailing spaces in the JavaScript regex flavor, you can use several options:

  • Require the first and the last non-whitespace character with \S (=[^\s]). This can be done with, say, ^\S[\S\s]*\S$. This regex requires at least 2 characters to be in the string. Your regex requires 3 chars in the input since you used +. It won't allow some Unicode whitespaces either.

  • You may use a combination of grouping with optional quantifiers (those allowing 0 length matches). See ^\S(?:\s*\S+)*$ (where \s is replaced with since it is a multiline demo). The \S at the beginning matches a non-whitespace char and then a non-capturing group follows, that is * quantified (matches zero or more occurrences) and matches 0+ sequences of 0+ whitespaces followed with 1+ non-whitespace characters. This is a good expression for those flavors like RE2 that do not support lookarounds, but support quantified groups.

  • You may use lookaheads to require the first and last character to be non-whitespace characters: ^(?=[\S\s]*\S$)\S[\S\s]*$ where (?=[\s\S]*\S$) requires the last char to be a non-whitespace and the \S after the lookahead will require the first char to be non-whitespace. [\s\S]* matches 0+ any characters. This will match 1 char strings, but won't match empty strings.

  • If your regex to match strings with no leading/trailing whitespaces should also match an empty string, use 2 negative lookaheads: ^(?!\s)(?![\S\s]*\s$)[\S\s]*$. The (?!\s) lookahead will fail the match if there is a leading whitespace, (?![\S\s]*\s$) will do the same in case of trailing whitespace, and [\s\S]* will match 0+ any characters. *If lookarounds are not supported, use ^(?:\S(?: *\S+)*)?$ that is much less efficient.

If you do not need to match any chars between the non-whitespace chars, you may revert [\s\S] to your [\S ]. In PCRE, a horizontal whitespace can be matched with \h, in .NET and others that support Unicode properties, you can use [\t\p{Zs}] to match any horizontal whitespace. In JS, [^\S\r\n\f\v\u2028\u2029] can be used for that purpose.

Note that some regex flavors do not support non-capturing groups, you may replace all (?: with ( in the above patterns.

Upvotes: 2

Related Questions