Tim
Tim

Reputation: 532

PHP Sophisticated String parsing

This may be able to be accomplished with a regular expression but I have no idea. What I am trying to accomplish is being able to parse a string with a given delimiter but when it sees a set of brackets it parses differently. As I am a visual learning let me show you an example of what I am attempting to achieve. (PS this is getting parsed from a url)

Given the string input:

String1,String2(data1,data2,data3),String3,String4

How can I "transform" this string into this array:

{
    "String1": "String1",
    "String2": [
        "data1",
        "data2",
        "data3"
    ],
    "String3": "String3",
    "String4": "String4
}

Formatting doesn't have to be this strict as I'm just attempting to make a simple API for my project.

Obviously things like

array explode ( string $delimiter , string $string [, int $limit = PHP_INT_MAX ] )

Wouldn't work because there are commas inside the brackets as well. I've attempted manual parsing looking at each character at a time but I fear for the performance and it doesn't actually work anyway. I've pasted the gist of my attempt.

https://gist.github.com/Fudge0952/24cb4e6a4ec288a4c492

Upvotes: 2

Views: 117

Answers (3)

fusion3k
fusion3k

Reputation: 11689

This is a solution with preg_match_all():

$string = 'String1,String2(data1,data2,data3),String3,String4,String5(data4,data5,data6)';

$pattern = '/([^,(]+)(\(([^)]+)\))?/';

preg_match_all( $pattern, $string, $matches );

$result = array();
foreach( $matches[1] as $key => $val )
{
    if( $matches[3][$key] )
    { $add = explode( ',', $matches[3][$key] ); }
    else
    { $add = $val; }
    $result[$val] = $add;
}

$json = json_encode( $result );

3v4l.org demo

Pattern explanation:

([^,(]+)        group 1: any chars except ‘,’ and ‘(’
(\(([^)]+)\))?  group 2: zero or one occurrence of brackets wrapping:
   └──┬──┘
   ┌──┴──┐
   ([^)]+)      group 3: any chars except ‘,’

Upvotes: 1

VolkerK
VolkerK

Reputation: 96159

You can either build an ad-hoc parser like (mostly untested):

<?php
$p = '!
    [^,\(\)]+  # token: String
    |,         # token: comma
    |\(        # token: open
    |\)        # token: close
!x';
$input = 'String1,String2(data1,data2,data3,data4(a,b,c)),String3,String4';

preg_match_all($p, $input, $m);
// using a norewinditerator, so we can use nested foreach-loops on the same iterator
$it = new NoRewindIterator(
    new ArrayIterator($m[0])
);

var_export( foo( $it ) );

function foo($tokens, $level=0) {
    $result = [];
    $current = null;
    foreach( $tokens as $t ) {
        switch($t) {
            case ')':
                break; // foreach loop
            case '(':
                if ( is_null($current) ) {
                    throw new Exception('moo');
                }
                $tokens->next();
                $result[$current] = foo($tokens, $level+1);
                $current = null;
                break;
            case ',':
                if ( !is_null($current) ) {
                    $result[] = $current;
                    $current = null;
                }
                break;
            default:
                $current = $t;
                break;
        }   
    }
    if ( !is_null($current) ) {
        $result[] = $current;
    }
    return $result;
}

prints

array (
  0 => 'String1',
  'String2' => 
  array (
    0 => 'data1',
    1 => 'data2',
    2 => 'data3',
    'data4' => 
    array (
      0 => 'a',
      1 => 'b',
      2 => 'c',
    ),
  ),
  1 => 'String3',
  2 => 'String4',
)

(but will most certainly fail horribly for not-well-formed strings)

or take a look at lexer/parser generator like e.g. PHP_LexerGenerator and PHP_ParserGenerator.

Upvotes: 1

kungphu
kungphu

Reputation: 4849

While you could try to split your initial string on commas and ignore anything in parentheses for the first split, this necessarily makes assumptions about what those string values can actually be (possibly requiring escaping/unescaping values depending on what those strings have to contain).

If you have control over the data format, though, it would be far better to just start with JSON. It's well-defined and well-supported.

Upvotes: 1

Related Questions