Jeff Mullins
Jeff Mullins

Reputation: 53

How can I replace all but certain regex patterns in a string?

How do I make a regex to take a string of names and title case all names in it except for the following patterns, which should be left alone: [\-\ ][A-Z][a-z]{1,2}[A-Z] and [\-\ ][v][ao][n] for use in JavaScript?

That is, ignore McD, MacD, -McD, -MacD, von and van? That is, I want to "fix" names typed in in jumbled case, like LaToNYA von fRANKENSTEIN McDONALD-MacINTOSH to be LaTonya von Frankenstein McDonald-MacIntosh.

I use the following for "title casing" (capitalizing the first letter of each name and lower casing the rest of the name):

name = name.replace(/\b\w+/g, function(txt){return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();}); 

This, when applied to the name above would result in Latonya Von Frankenstein Mcdonald-Macintosh, which is not desirable, especially if the person entering their name typed LaTonya, von, McDonald and MacIntosh and it is changed against their wishes. How can I adjust my replace to leave the patterns given as regex above alone (if the user types latonya, MACDONALD, or VON, then I have no problem changing to Latonya, Macdonald, or Von)?

Upvotes: 1

Views: 55

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626952

You may use

var name = "LaToNYA von fRANKENSTEIN McDONALD-MacINTOSH";
var expected = "LaTonya von Frankenstein McDonald-MacIntosh";
name = name.replace(/\b(v[ao]n|[A-Z][a-z]{1,2}[A-Z])?(\w*)/g, function($0,$1,$2) {
   return $1 ? $1 + $2.toLowerCase() : 
     $0.charAt(0).toUpperCase() + 
       ($0.length > 1 ? $0.substr(1).toLowerCase() : "");
 });
console.log(name, " => " , (expected === name ? "identical" : "different"));

Details

  • \b - a word boundary
  • (v[ao]n|[A-Z][a-z]{1,2}[A-Z])? - Group 1 capturing one or zero occurrences of
    • v[ao]n - von or van
    • | - or
    • [A-Z][a-z]{1,2}[A-Z] - an uppercase ASCII letter, 1 or 2 lowercase ones, and an uppercase ASCII letter again
  • (\w*) - Group 2 capturing zero or more word chars

The $0,$1,$2 stand for the whole match, Group 1 and Group 2 values.

Upvotes: 1

Related Questions