Joel Glovacki
Joel Glovacki

Reputation: 813

RegEx to match specific words unless it's the last word in a sentence (titleize)

i'm capitalizing all words, and then lowercasing words like a, of, and. the first and last words should remain capitalized. i've tried using \s instead of \b, and that caused some strange issues. i've also tried [^$] but that doesn't seem to mean "not end of string"

function titleize($string){
  return ucfirst(
     preg_replace("/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)\b/uie",
     "strtolower('$1')", 
     ucwords($string))
  );
}

this is the only failing test i'm trying to fix. the "in" at the end should remain capitalized.

titleize("gotta give up, gotta give in");
//Gotta Give Up, Gotta Give In

these tests pass:

titleize('if i told you this was killing me, would you stop?');
//If I Told You This Was Killing Me, Would You Stop?

titleize("we're at the top of the world (to the simple two)");
//We're at the Top of the World (to the Simple Two)

titleize("and keep reaching for those stars");
//And Keep Reaching for Those Stars

Upvotes: 1

Views: 296

Answers (3)

rubber boots
rubber boots

Reputation: 15204

You apply ucwords() before sending the string to the regex-replace, and then again ucfirst after returning from regex (for words appearing at the start of the string). This can be shortened by the convention that every word at the start and at the end of your string is not surrounded by whitespaces. Using this convention, we can use a regex like '/(?<=\s)( ... )(?=\s)/'. This will simplify your function somehow:

function titleize2($str) {
 $NoUc = Array('A','Of','An','At','The','With','In','To','And','But','Is','For');
 $reg = '/(?<=\s)('      # set lowercase only if surrounded by whitespace
      . join('|', $NoUc) # add OR'ed list of words
      . ')(?=\s)/e';     # set regex-eval mode
 return preg_replace( $reg, 'strtolower("\\1")', ucwords($str) );
}

If tested with:

...
$Strings = Array('gotta give up, gotta give in',
                 'if i told you this was killing me, would you stop?',
                 'we\'re at the top of the world (to the simple two)',
                 'and keep reaching for those stars');

foreach ($Strings as $s)
   print titleize2($s) . "\n";
...

... this will return the correct results.

Upvotes: 1

Crisp
Crisp

Reputation: 11447

Adding a negative lookahead for the end of line (?!$) should do what you want

function titleize($string){
  return ucfirst(
     preg_replace("/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)\b(?!$)/uie",
     "strtolower('$1')", 
     ucwords(inflector::humanize($string)))
  );
}

Upvotes: 0

morja
morja

Reputation: 8560

Try this regex:

/\b(A|Of|An|At|The|With|In|To|And|But|Is|For)(?!$)\b/uie

The negative lookahead (?!$) excludes matches where a endofline follows.

Upvotes: 0

Related Questions