Reputation: 846
I'm quite terrible at regexes.
I have a string that may have 1 or more words in it (generally 2 or 3), usually a person name, for example:
$str1 = 'John Smith';
$str2 = 'John Doe';
$str3 = 'David X. Cohen';
$str4 = 'Kim Jong Un';
$str5 = 'Bob';
I'd like to convert each as follows:
$str1 = 'John S.';
$str2 = 'John D.';
$str3 = 'David X. C.';
$str4 = 'Kim J. U.';
$str5 = 'Bob';
My guess is that I should first match the first word, like so:
preg_match( "^([\w\-]+)", $str1, $first_word )
then all the words after the first one... but how do I match those? should I use again preg_match and use offset = 1 in the arguments? but that offset is in characters or bytes right?
Anyway after I matched the words following the first, if the exist, should I do for each of them something like:
$second_word = substr( $following_word, 1 ) . '. ';
Or my approach is completely wrong?
Thanks
ps - it would be a boon if the regex could maintain the whole first two words when the string contain three or more words... (e.g. 'Kim Jong U.').
Upvotes: 1
Views: 506
Reputation: 48031
Simply start matching from the second character of the last (non-first) word of the string -- then replace that match with a dot (not capturing or referencing is needed).
Match a literal space, then \w
to match a word character, then forget those characters with \K
, then match zero or more word characters until the end of the string to ensure that all last names are truncated after their first letter and add a dot on the end.
Code: (Demo)
$strings = [
'John Smith',
'John Doe',
'David X. Cohen',
'Kim Jong Un',
'Bob',
];
var_export(
preg_replace('/ \w\K\w*$/', '.', $strings)
);
Upvotes: 0
Reputation: 56829
A simple solution with only look-ahead and word boundary check:
preg_replace('~(?!^)\b(\w)\w+~', '$1.', $string);
(\w)\w+
is a word in the name, with the first character captured(?!^)\b
performs a word boundary check \b
, and makes sure the match is not at the start of the string (?!^)
.Upvotes: 0
Reputation: 174816
You could use a positive lookbehind assertion.
(?<=\h)([A-Z])\w+
OR
Use this regex if you want to turn Bob F
to Bob F.
(?<=\h)([A-Z])\w*(?!\.)
Then replace the matched characters with \1.
Code would be like,
preg_replace('~(?<=\h)([A-Z])\w+~', '\1.', $string);
(?<=\h)([A-Z])
Captures all the uppercase letters which are preceeded by a horizontal space character.
\w+
matches one or more word characters.
Replace the matched chars with the chars inside the group index 1 \1
plus a dot will give you the desired output.
Upvotes: 1
Reputation: 785846
It can be done in single preg_replace
using a regex.
You can search using this regex:
^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+
And replace by:
$1.
Code:
$name = preg_replace('/^\w+(?:$| +)(*SKIP)(*F)|(\w)\w+/', '$1.', $name);
Explanation:
(*FAIL)
behaves like a failing negative assertion and is a synonym for (?!)
(*SKIP)
defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later(*SKIP)(*FAIL)
together provide a nice alternative of restriction that you cannot have a variable length lookbehind in above regex.^\w+(?:$| +)(*SKIP)(*F)
matches first word in a name and skips it (does nothing)(\w)\w+
matches all other words and replaces it with first letter and a dot.Upvotes: 4