Reputation: 2165
Digits are optional, and are only allowed in the end of a word
Spaces are optional, and are only allowed in the middle of a word.
I am pretty much just trying to match the possible months in a few languages, say English and Vietnamese
For example, the following are valid matches:
'June'
'tháng 6'
But the following are not because of space: 'June '
' June'
This is my testcases: https://regex101.com/r/pZ0mN3/2.
As you can see, I came up with ^\S[\S ]+\S$
which is kind of working, but I wonder if there's a better way to do it.
Upvotes: 3
Views: 2429
Reputation: 626747
To match a string with no leading and trailing spaces in the JavaScript regex flavor, you can use several options:
Require the first and the last non-whitespace character with \S
(=[^\s]
). This can be done with, say, ^\S[\S\s]*\S$
. This regex requires at least 2 characters to be in the string. Your regex requires 3 chars in the input since you used +
. It won't allow some Unicode whitespaces either.
You may use a combination of grouping with optional quantifiers (those allowing 0 length matches). See ^\S(?:\s*\S+)*$
(where \s
is replaced with
since it is a multiline demo). The \S
at the beginning matches a non-whitespace char and then a non-capturing group follows, that is *
quantified (matches zero or more occurrences) and matches 0+ sequences of 0+ whitespaces followed with 1+ non-whitespace characters. This is a good expression for those flavors like RE2 that do not support lookarounds, but support quantified groups.
You may use lookaheads to require the first and last character to be non-whitespace characters: ^(?=[\S\s]*\S$)\S[\S\s]*$
where (?=[\s\S]*\S$)
requires the last char to be a non-whitespace and the \S
after the lookahead will require the first char to be non-whitespace. [\s\S]*
matches 0+ any characters. This will match 1 char strings, but won't match empty strings.
If your regex to match strings with no leading/trailing whitespaces should also match an empty string, use 2 negative lookaheads: ^(?!\s)(?![\S\s]*\s$)[\S\s]*$
. The (?!\s)
lookahead will fail the match if there is a leading whitespace, (?![\S\s]*\s$)
will do the same in case of trailing whitespace, and [\s\S]*
will match 0+ any characters. *If lookarounds are not supported, use ^(?:\S(?: *\S+)*)?$
that is much less efficient.
If you do not need to match any chars between the non-whitespace chars, you may revert [\s\S]
to your [\S ]
. In PCRE, a horizontal whitespace can be matched with \h
, in .NET and others that support Unicode properties, you can use [\t\p{Zs}]
to match any horizontal whitespace. In JS, [^\S\r\n\f\v\u2028\u2029]
can be used for that purpose.
Note that some regex flavors do not support non-capturing groups, you may replace all (?:
with (
in the above patterns.
Upvotes: 2