caramba
caramba

Reputation: 22480

split string by spaces and colon but not if inside quotes

having a string like this:

$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf"

the desired result is:

[0] => Array (
    [0] => dateto:'2015-10-07 15:05'
    [1] => xxxx
    [2] => datefrom:'2015-10-09 15:05'
    [3] => yyyy
    [4] => asdf
)

what I get with:

preg_match_all("/\'(?:[^()]|(?R))+\'|'[^']*'|[^(),\s]+/", $str, $m);

is:

[0] => Array (
    [0] => dateto:'2015-10-07
    [1] => 15:05'
    [2] => xxxx
    [3] => datefrom:'2015-10-09
    [4] => 15:05'
    [5] => yyyy
    [6] => asdf
)

Also tried with preg_split("/[\s]+/", $str) but no clue how to escape if value is between quotes. Can anyone show me how and also please explain the regex. Thank you!

Upvotes: 5

Views: 1361

Answers (3)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Often, when you are looking to split a string, using preg_split isn't the best approach (that seems a little counter intuitive, but that's true most of the time). A more efficient way consists to find all items (with preg_match_all) using a pattern that describes all that is not the delimiter (white-spaces here):

$pattern = <<<'EOD'
~(?=\S)[^'"\s]*(?:'[^']*'[^'"\s]*|"[^"]*"[^'"\s]*)*~
EOD;

if (preg_match_all($pattern, $str, $m))
    $result = $m[0];

pattern details:

~                    # pattern delimiter

(?=\S)               # the lookahead assertion only succeeds if there is a non-
                     # white-space character at the current position.
                     # (This lookahead is useful for two reasons:
                     #    - it allows the regex engine to quickly find the start of
                     #      the next item without to have to test each branch of the
                     #      following alternation at each position in the strings
                     #      until one succeeds.
                     #    - it ensures that there's at least one non-white-space.
                     #      Without it, the pattern may match an empty string.
                     # )

[^'"\s]*          #"'# all that is not a quote or a white-space

(?:                  # eventual quoted parts
    '[^']*' [^'"\s]*  #"# single quotes
  |
    "[^"]*" [^'"\s]*    # double quotes
)*
~

demo

Note that with this a little long pattern, the five items of your example string are found in only 60 steps. You can use this shorter/more simple pattern too:

~(?:[^'"\s]+|'[^']*'|"[^"]*")+~

but it's a little less efficient.

Upvotes: 2

Pedro Lobito
Pedro Lobito

Reputation: 98901

For your example, you can use preg_split with negative lookbehind (?<!\d), i.e.:

<?php
$str = "dateto:'2015-10-07 15:05' xxxx datefrom:'2015-10-09 15:05' yyyy asdf";
$matches = preg_split('/(?<!\d)(\s)/', $str);
print_r($matches);

Output:

    Array
    (
        [0] => dateto:'2015-10-07 15:05'
        [1] => xxxx
        [2] => datefrom:'2015-10-09 15:05'
        [3] => yyyy
        [4] => asdf
    )

Demo:

http://ideone.com/EP06Nt


Regex Explanation:

(?<!\d)(\s)

Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\d)»
   Match a single character that is a “digit” «\d»
Match the regex below and capture its match into backreference number 1 «(\s)»
   Match a single character that is a “whitespace character” «\s»

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

I would use PCRE verb (*SKIP)(*F),

preg_split("~'[^']*'(*SKIP)(*F)|\s+~", $str);

DEMO

Upvotes: 5

Related Questions