Chewy
Chewy

Reputation: 196

How to use a regular expression with preg_match_all to split a string into blocks following a pattern

I'm going to be working with a long string of data that is serialized into blocks using a pattern (x:y).

However, I struggle with regular expressions, and are looking for resources to help identify how to construct a regex to identify any/all of these blocks as they appear in a string.

For example, given the following string:

$s = 't:user c:red t:admin n:"bob doe" s:expressionsf:json';

Note: the f:json at the end is missing a space on purpose, because the format might vary with how the string is eventually given to me. Each block might be spaced, and they might not.

How would I identify each block of x:y to end with the below result:

Array
(
    [0] => t:user
    [1] => c:red
    [2] => t:admin
    [3] => n:"bob doe"
    [4] => s:expression
    [5] => f:json
)

I've tested various expressions using my limited knowledge, but have not been terribly successful.

I can successfully match the pattern using something like this:

^[ctrns]:.+

But this unfourtunately matches the entire string. The part I seem to be missing is how to break each block, while also maintaining the ability to keep spaces within the pairs (see n:"bob doe" example).

Any assistance would be super appreciated! Also, ideally any submission would be explained as to what each token in the expression was accomplishing so that I better my understanding of these techniques.

I've been using https://regexr.com/ to practice.

Upvotes: 1

Views: 32

Answers (1)

anubhava
anubhava

Reputation: 785611

You may use this regex in preg_match_all:

[ctnsf]:(?:"[^"\\]*(?:\\.[^"\\]*)*"|\S+?(?=[ctnsf]:|\s|$))

RegEx Demo

RegEx Details:

  • [ctnsf]:: Match one of ctnsf characters followed by :
  • (?:"[^"\\]*(?:\\.[^"\\]*)*": Match a quoted substring. This takes care of escaped quotes as well.
  • |: OR
  • \S+?: Match 1+ not-whitespace characters (non-greedy)
  • (?=[ctnsf]:|\s|$): Positive lookahead to assert one of the conditions given in assertions.

Code:

$re = '/[ctnsf]:(?:"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\S+?(?=[ctnsf]:|\s|$))/m';
$str = 't:user c:red t:admin n:"bob \\"doe" s:expressionsf:json';

preg_match_all($re, $str, $matches);

// Print the entire match result
print_r($matches[0]);

Code Demo

Upvotes: 1

Related Questions