unfulvio
unfulvio

Reputation: 846

Regex in PHP: take all the words after the first one in string and truncate all of them to the first character

I'm quite terrible at regexes.

I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example:

$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';

I'd like to convert each as follows:

$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';

My guess is that I should first match the first word, like so:

preg_match( "^([\w\-]+)", $str1, $first_word )

then all the words after the first one... but how do I match those? should I use again preg_match and use offset = 1 in the arguments? but that offset is in characters or bytes right?

Anyway after I matched the words following the first, if the exist, should I do for each of them something like:

$second_word = substr( $following_word, 1 ) . '. ';

Or my approach is completely wrong?

Thanks

ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (e.g. 'Kim Jong U.').

Upvotes: 1

Views: 506

Answers (4)

mickmackusa
mickmackusa

Reputation: 48031

Simply start matching from the second character of the last (non-first) word of the string -- then replace that match with a dot (not capturing or referencing is needed).

Match a literal space, then \w to match a word character, then forget those characters with \K, then match zero or more word characters until the end of the string to ensure that all last names are truncated after their first letter and add a dot on the end.

Code: (Demo)

$strings = [
    'John Smith',
    'John Doe',
    'David X. Cohen',
    'Kim Jong Un',
    'Bob',
];

var_export(
    preg_replace('/ \w\K\w*$/', '.', $strings)
);

Upvotes: 0

nhahtdh
nhahtdh

Reputation: 56829

A simple solution with only look-ahead and word boundary check:

preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
  • (\w)\w+ is a word in the name, with the first character captured
  • (?!^)\b performs a word boundary check \b, and makes sure the match is not at the start of the string (?!^).

Demo

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174816

You could use a positive lookbehind assertion.

(?<=\h)([A-Z])\w+

OR

Use this regex if you want to turn Bob F to Bob F.

(?<=\h)([A-Z])\w*(?!\.)

Then replace the matched characters with \1.

DEMO

Code would be like,

preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);

DEMO

  • (?<=\h)([A-Z]) Captures all the uppercase letters which are preceeded by a horizontal space character.

  • \w+ matches one or more word characters.

  • Replace the matched chars with the chars inside the group index 1 \1 plus a dot will give you the desired output.

Upvotes: 1

anubhava
anubhava

Reputation: 785846

It can be done in single preg_replace using a regex.

You can search using this regex:

^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+

And replace by:

$1.

RegEx Demo

Code:

$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);

Explanation:

  • (*FAIL) behaves like a failing negative assertion and is a synonym for (?!)
  • (*SKIP) defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later
  • (*SKIP)(*FAIL) together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex.
  • ^\w+(?:$| +)(*SKIP)(*F) matches first word in a name and skips it (does nothing)
  • (\w)\w+ matches all other words and replaces it with first letter and a dot.

Upvotes: 4

Related Questions