Sparkup
Sparkup

Reputation: 3754

Javascript regex difficulties

I'm trying to match a string with the following:

I've been trying the following:

new RegExp(/^[A-Z](?!.*[- ']$).*[a-zA-Z- '\u00E0-\u00EF]$/);

My problem isn't that I'm not able to understand what regular expressions do, but whether they are correct. It's very easy (or not) to write a regex that looks like it should work but misses on a few things.

Any help would be much appreciated.

Edit

Valid string : Marie-Noëlle Tranchant, Jean-François Copé...

Upvotes: 1

Views: 262

Answers (4)

user557597
user557597

Reputation:

Edit - redo
After revisiting this thread, I noticed these comments:

"does not have [- '] next to one another" all 9 possibilities here or just the three of
the same character doubled up? – jswolf19 2 days ago
@jswolf19 do not have : 'space''space' , -- or ''. – Stack 101 2 days ago
"

In light of this, you have to go with what @jswolf19 did.

His regex could probably be simplified a little more:

pcre:
/^[A-Z](?:([\- '])(?!$|\1)|[a-zA-Z\x{E0}-\x{EF}])+$/

js:
/^[A-Z](?:([\- '])(?!$|\1)|[a-zA-Z\u00E0-\u00EF])+$/

expanded JavaScript:  
^                     # start of string
   [A-Z]                     # single A-Z char
   (?:                       # non-capture group
       ([\- '])                   # capture group 1, single char from: [- ']
       (?! $ | \1 )               # not the end of string nor the
                                  #   char captured in group 1 (backreference)
     |                          # OR,
       [a-zA-Z\u00E0-\u00EF]      # a single char from: [a-zA-Z\u00E0-\u00EF]
   )+                        # end non-capture group, do 1 or more times
$                     # end of string

Please test answers before you mark them as correct. Others may visit this thread
in the future.

Upvotes: 1

Mike Samuel
Mike Samuel

Reputation: 120586

/^[A-Z](?:[- ']?[a-zA-Z\u00E0-\u00EF])+$/

Below is a proof of why this meets your criteria. If you change the non-capturing group (?:...) to a (...) then it is also the shortest regexp that meets your criteria.

starts with [A-Z]

because of the ^[A-Z].

contains [a-zA-Z- '\u00E0-\u00EF] (Latin-1 Supplement - Match Unicode Block Range) any other character is forbidden

because the entire thing must match character sets containing only those characters

does not end with [- '] or have [- '] next to one another.

because [- '] is restricted to zero or one occurrence per following occurrence of [a-zA-Z\u00E0-\u00EF]

has at least 2 characters

because the [A-Z] matches at least one character and the + after the (?:...) group requires another one.

Upvotes: 3

Joseph Marikle
Joseph Marikle

Reputation: 78590

A very basic way to test regex is to take a literal string e.g. "blah this is text" and using the .match method with it. You can open a js console (Ctrl + Shift + J in Chrome) and directly run it to see what it returns

"Marie-Noëlle Tranchant".match(/^[A-Z][-a-zA-Z '\u00E0-\u00EF]*[^- ']$/);

Upvotes: 1

jswolf19
jswolf19

Reputation: 2303

I don't think your regexp will do what you want. It should accept any string that starts with [A-Z] and ends with [a-zA-Z\u00E0-\u00EF] (with any characters in between, including ones you don't want to accept), although I can't say for sure since I don't know how the unescaped '-' is handled...

I think you want something more like this:

new RegExp(/^[A-Z](?:(?!--|''|  )[a-zA-Z\- '\u00E0-\u00EF])*[a-zA-Z\u00E0-\u00EF]$/);

Upvotes: 1

Related Questions