Reputation: 3754
I'm trying to match a string with the following:
[A-Z]
[a-zA-Z- '\u00E0-\u00EF]
(Latin-1 Supplement - Match Unicode Block Range)[- ']
or have [- ']
next to one another.2
charactersI've been trying the following:
new RegExp(/^[A-Z](?!.*[- ']$).*[a-zA-Z- '\u00E0-\u00EF]$/);
My problem isn't that I'm not able to understand what regular expressions do, but whether they are correct. It's very easy (or not) to write a regex that looks like it should work but misses on a few things.
Any help would be much appreciated.
Edit
Valid string : Marie-Noëlle Tranchant
, Jean-François Copé
...
Upvotes: 1
Views: 262
Reputation:
Edit - redo
After revisiting this thread, I noticed these comments:
"does not have [- '] next to one another" all 9 possibilities here or just the three of
the same character doubled up? – jswolf19 2 days ago
@jswolf19 do not have : 'space''space'
, --
or ''
. – Stack 101 2 days ago
"
In light of this, you have to go with what @jswolf19 did.
His regex could probably be simplified a little more:
pcre:
/^[A-Z](?:([\- '])(?!$|\1)|[a-zA-Z\x{E0}-\x{EF}])+$/
js:
/^[A-Z](?:([\- '])(?!$|\1)|[a-zA-Z\u00E0-\u00EF])+$/
expanded JavaScript:
^ # start of string
[A-Z] # single A-Z char
(?: # non-capture group
([\- ']) # capture group 1, single char from: [- ']
(?! $ | \1 ) # not the end of string nor the
# char captured in group 1 (backreference)
| # OR,
[a-zA-Z\u00E0-\u00EF] # a single char from: [a-zA-Z\u00E0-\u00EF]
)+ # end non-capture group, do 1 or more times
$ # end of string
Please test answers before you mark them as correct. Others may visit this thread
in the future.
Upvotes: 1
Reputation: 120586
/^[A-Z](?:[- ']?[a-zA-Z\u00E0-\u00EF])+$/
Below is a proof of why this meets your criteria. If you change the non-capturing group (?:...)
to a (...)
then it is also the shortest regexp that meets your criteria.
starts with [A-Z]
because of the ^[A-Z]
.
contains [a-zA-Z- '\u00E0-\u00EF] (Latin-1 Supplement - Match Unicode Block Range) any other character is forbidden
because the entire thing must match character sets containing only those characters
does not end with [- '] or have [- '] next to one another.
because [- ']
is restricted to zero or one occurrence per following occurrence of [a-zA-Z\u00E0-\u00EF]
has at least 2 characters
because the [A-Z]
matches at least one character and the +
after the (?:...)
group requires another one.
Upvotes: 3
Reputation: 78590
A very basic way to test regex is to take a literal string e.g. "blah this is text" and using the .match method with it. You can open a js console (Ctrl + Shift + J in Chrome) and directly run it to see what it returns
"Marie-Noëlle Tranchant".match(/^[A-Z][-a-zA-Z '\u00E0-\u00EF]*[^- ']$/);
Upvotes: 1
Reputation: 2303
I don't think your regexp will do what you want. It should accept any string that starts with [A-Z]
and ends with [a-zA-Z\u00E0-\u00EF]
(with any characters in between, including ones you don't want to accept), although I can't say for sure since I don't know how the unescaped '-' is handled...
I think you want something more like this:
new RegExp(/^[A-Z](?:(?!--|''| )[a-zA-Z\- '\u00E0-\u00EF])*[a-zA-Z\u00E0-\u00EF]$/);
Upvotes: 1