Reputation: 496
I have an input string:
$subject = "This punctuation! And this one. Does n't space that one."
I also have an array containing exceptions to the replacement I wish to perform, currently with one member:
$exceptions = array(
0 => "n't"
);
The reason for the complicated solution I would like to achieve is because this array will be extended in future and could potentially include hundreds of members.
I would like to insert whitespace at word boundaries (duplicate whitespace will be removed later). Certain boundaries should be ignored, though. For example, the exclamation mark and full stops in the above sentence should be surrounded with whitespace, but the apostrophe should not. Once duplicate whitespaces are removed from the final result with trim(preg_replace('/\s+/', ' ', $subject));
, it should look like this:
"This punctuation ! And this one . Does n't space that one ."
I am working on a solution as follows:
Use preg_match('\b', $subject, $offsets, 'PREG_OFFSET_CAPTURE');
to gather an array of indexes where whitespace may be inserted.
Iterate over the $offsets
array.
$subject
from whitespace before the current offset until the next whitespace or end of line.$exceptions
array.So far I have the following code:
$subject="This punctuation! And this one. Does n't space that one.";
$pattern = '/\b/';
preg_match($pattern, $subject, $offsets, PREG_OFFSET_CAPTURE );
if(COUNT($offsets)) {
$indexes = array();
for($i=0;$i<COUNT($offsets);$i++) {
$offsets[$i];
$substring = '?';
// Replace $substring with substring from after whitespace prior to $offsets[$i] until next whitespace...
if(!array_search($substring, $exceptions)) {
$indexes[] = $offsets[$i];
}
}
// Insert whitespace character at each offset stored in $indexes...
}
I can't find an appropriate way to create the $substring
variable in order to complete the above example.
Upvotes: 0
Views: 353
Reputation: 91518
$res = preg_replace("/(?:n't|ALL EXCEPTIONS PIPE SEPARATED)(*SKIP)(*F)|(?!^)(?<!\h)\b(?!\h)/", " ", $subject);
echo $res;
Output:
This punctuation ! And this one . Doesn't space that one .
Upvotes: 2
Reputation: 14927
One "easy" (but not necessarily fast, depending on how many exceptions you have) solution would be to first replace all the exceptions in the string with something unique that doesn't contain any punctuation, then perform your replacements, then convert back the unique replacement strings into their original versions.
Here's an example using md5
(but could be lots of other things):
$subject = "This punctuation! And this one. Doesn't space that one.";
$exceptions = ["n't"];
foreach ($exceptions as $exception) {
$result = str_replace($exception, md5($exception), $subject);
}
$result = preg_replace('/[^a-z0-9\s]/i', ' \0', $result);
foreach ($exceptions as $exception) {
$result = str_replace(md5($exception), $exception, $result);
}
echo $result; // This punctuation ! And this one . Doesn't space that one .
Upvotes: 0