Muckee
Muckee

Reputation: 496

PHP: How to extract a substring from a specified index until the next whitespace or end of line

I have an input string:

$subject = "This punctuation! And this one. Does n't space that one."

I also have an array containing exceptions to the replacement I wish to perform, currently with one member:

$exceptions = array(
  0 => "n't"
);

The reason for the complicated solution I would like to achieve is because this array will be extended in future and could potentially include hundreds of members.

I would like to insert whitespace at word boundaries (duplicate whitespace will be removed later). Certain boundaries should be ignored, though. For example, the exclamation mark and full stops in the above sentence should be surrounded with whitespace, but the apostrophe should not. Once duplicate whitespaces are removed from the final result with trim(preg_replace('/\s+/', ' ', $subject));, it should look like this:

"This punctuation ! And this one . Does n't space that one ."

I am working on a solution as follows:

  1. Use preg_match('\b', $subject, $offsets, 'PREG_OFFSET_CAPTURE'); to gather an array of indexes where whitespace may be inserted.

  2. Iterate over the $offsets array.

So far I have the following code:

$subject="This punctuation! And this one. Does n't space that one.";
$pattern = '/\b/';
preg_match($pattern, $subject, $offsets, PREG_OFFSET_CAPTURE );

if(COUNT($offsets)) {
  $indexes = array();
  for($i=0;$i<COUNT($offsets);$i++) {
    $offsets[$i];
    $substring = '?';

    // Replace $substring with substring from after whitespace prior to $offsets[$i] until next whitespace...

    if(!array_search($substring, $exceptions)) {
      $indexes[] = $offsets[$i];
    }
  }

  // Insert whitespace character at each offset stored in $indexes...

}

I can't find an appropriate way to create the $substring variable in order to complete the above example.

Upvotes: 0

Views: 353

Answers (2)

Toto
Toto

Reputation: 91518

$res = preg_replace("/(?:n't|ALL EXCEPTIONS PIPE SEPARATED)(*SKIP)(*F)|(?!^)(?<!\h)\b(?!\h)/", " ", $subject);
echo $res;

Output:

This punctuation ! And this one . Doesn't space that one .

Demo & explanation

Upvotes: 2

Jeto
Jeto

Reputation: 14927

One "easy" (but not necessarily fast, depending on how many exceptions you have) solution would be to first replace all the exceptions in the string with something unique that doesn't contain any punctuation, then perform your replacements, then convert back the unique replacement strings into their original versions.

Here's an example using md5 (but could be lots of other things):

$subject = "This punctuation! And this one. Doesn't space that one.";

$exceptions = ["n't"];

foreach ($exceptions as $exception) {
    $result = str_replace($exception, md5($exception), $subject);
}

$result = preg_replace('/[^a-z0-9\s]/i', ' \0', $result);

foreach ($exceptions as $exception) {
    $result = str_replace(md5($exception), $exception, $result);
}

echo $result;  // This punctuation ! And this one . Doesn't space that one .

Demo

Upvotes: 0

Related Questions