Tom
Tom

Reputation: 1227

Split a string on different substrings, but conserve those substrings

I'm trying to split the following string:

Hello how are you<br>Foo bar hello

Into

"Hello", " how", " are", " you", "<br>", " Foo", " bar", " Hello"

Is this possible?

Upvotes: 1

Views: 151

Answers (5)

nickb
nickb

Reputation: 59699

Don't make things harder than you have to. Use preg_split() with the PREG_SPLIT_DELIM_CAPTURE flag, and capture the <br>:

$str = 'Hello how are you<br>Foo bar hello';
$array = preg_split( '/\s+|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
print_r( $array);

Output:

Array 
( 
    [0] => Hello 
    [1] => how 
    [2] => are 
    [3] => you 
    [4] => <br> 
    [5] => Foo 
    [6] => bar 
    [7] => hello
)

Edit: To include the space in the following token, you can use an assertion:

$array = preg_split( '/(?:\s*(?=\s))|(<br>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);

So, the goal of preg_split() is to find a spot in the string to split. The regex we use consists of two parts, OR'd together with |:

  1. (?:\s*(?=\s)). This starts off with a non-capturing group (?:), because when we match this part of the regex, we do not want it returned to us. Inside the non-capturing group, is \s*(?=\s), which says "match zero or more whitespace characters, but assert that the next character is a whitespace character". Looking at our input string, this makes sense:

    Hello how are you<br>Foo bar hello
         ^   ^
    

    The regex will start from left to right, find "Hello{space}how", and decide how to split the string. It tries to match \s* with the restriction that if it consumes any space, there needs to be one space left. So, it breaks up the string at just "Hello". When it continues, it has " how are you
    Foo bar hello" left. It starts the match again, trying to match from where it left off, and sees " how are", and does the same split as above. It continues until there are no matches left.

  2. Capture <br>, with (<br>). It is captured because when we match this, we want to keep it in the output, so capturing it along with the PREG_SPLIT_DELIM_CAPTURE causes it to be returned to us when it is matched (instead of being completely consumed).

This results in:

array(8) 
{ 
    [0]=> string(5) "Hello" 
    [1]=> string(4) " how" 
    [2]=> string(4) " are" 
    [3]=> string(4) " you" 
    [4]=> string(4) "<br>" 
    [5]=> string(3) "Foo" 
    [6]=> string(4) " bar" 
    [7]=> string(6) " hello" 
}

Upvotes: 3

Ryan
Ryan

Reputation: 14649

Borrowing the preg_split pattern from @nickb's answer:

<?php
$string = 'Hello how are you<br>Foo bar hello';

$array = preg_split('/\s/',$string);

foreach($array as $key => $value) {
    $a = preg_split( '/\s+|(<br>)/', $value, -1, PREG_SPLIT_DELIM_CAPTURE);
    if(is_array($a)) {
        foreach($a as $key2 => $value2) {
            $result[] = $value2;
        }
    }
}

print_r($result);
?>

Output:

Array
(
    [0] => Hello
    [1] => how
    [2] => are
    [3] => you
    [4] => <br>
    [5] => Foo
    [6] => bar
    [7] => hello
)

Upvotes: 0

Charaf JRA
Charaf JRA

Reputation: 8334

Here is a brief solution. Replace <br> by (space <br> space) and split using space:

<?php
   $newStr=str_replace("<br>"," <br> ","Hello how are you<br>Foo bar hello");
   $str= explode(' ',$newStr);   
?>

Output of print_r($str):

(
    [0] => Hello
    [1] => how
    [2] => are
    [3] => you
    [4] => <br>
    [5] => Foo
    [6] => bar
    [7] => hello
)

Upvotes: 0

anon
anon

Reputation:

This is how I'd do it:

  1. Explode the string with space as a delimiter
  2. Loop through the parts
  3. Use strpos and check if part contains the given tag -- <br> in this case
  4. If it does, explode the string again with the tag as the delimiter
  5. Push all the three items into the result array
  6. If it doesn't, then push it into the result array

Code:

$str = 'Hello how are you<br>Foo bar hello';
$parts = explode(' ', $str);
$result = array();

foreach ($parts as $part) {
    if(strpos($part, '<br>') !== FALSE) {
        $arr = explode('<br>', $part);
        $result = array_merge($result, $arr);    
        $result[] = "<br>";
    }
    else {
        $result[] = $part;
    }
}
print_r($result);

Output:

Array
(
    [0] => Hello
    [1] => how
    [2] => are
    [3] => you
    [4] => Foo
    [5] => <br>
    [6] => bar
    [7] => hello
)

Demo!

Upvotes: 0

Yuriy Babenko
Yuriy Babenko

Reputation: 178

Not pretty, but simple enough:

$data = 'Hello how are you<br>Foo bar hello';
$split = array();
foreach (explode('<br>', $data) as $line) {
  $split[] = array_merge($split, explode(' ', $line));
  $split[] = '<br>';
}
array_pop($split);

print_r($split);

Or version 2:

$data = 'Hello how are you<br>Foo bar hello';
$data = preg_replace('#\s|(<br>)#', '**$1**', $data);
$split = array_filter(explode('**', $data));
print_r($split);

Upvotes: 1

Related Questions