Donovan
Donovan

Reputation: 6122

Explode complex string by commas in PHP

I need to split a string that contains commas. I already found something for strings like (str_getcsv):

'A', 'B with a comma, eh', 'C'

But my string is like this, e.g. with no enclosing characters for the values:

A, B (one, two), C

I need to explode this and obtain:

array(3) {
  [0]=>
  string(1) "A"
  [1]=>
  string(12) "B (one, two)"
  [2]=>
  string(1) "C"
}

I want to split the string using the commas that are not inside parentheses, because that is the only one case in my situation when explode fails.

Upvotes: 2

Views: 1216

Answers (3)

mickmackusa
mickmackusa

Reputation: 47883

More elegantly than creating an array and then filtering the results, you can use preg_split() in this one-function one-liner:

Code: (Demo)

$string='A, B (one, two), C';
var_export(preg_split('/(?:\([^)]*\)(*SKIP)(*FAIL))|, /',$string));

Output:

array (
  0 => 'A',
  1 => 'B (one, two)',
  2 => 'C',
)

Pattern Demo

  • (*SKIP)(*FAIL) is a technique that disqualifies substrings before matching.
  • The negated character class [^)]* is a faster alternative to . (dot). *if you have nested bracketed expressions, this pattern will not work ...writing a pattern for that scenario is a bit beyond the scope of this question.

Upvotes: 0

DarkSide
DarkSide

Reputation: 3709

This snippet helps me with nested parenthesis. Basically idea is to kind of recursively replace (*) with some identifier until there are no more parenthesis. Then explode string by comma and then put everything back. This is not ideal solution - made it just now in about 30 minutes, but it works :) It definitely can be optimized somehow.

/**
 * Explode string by delimiter, but don't explode if delimiter is inside parenthesis.
 * This also support nested parenthesis - that's where pure RegExp solutions fails.
 * 
 * For example,
 *  $input = "one, two three, four (five, (six, seven), (eight)) (nine, ten), eleven";
 *  $output = array(
 *      'one',
 *      'two three',
 *      'four (five, (six, seven), (eight)) (nine, ten)',
 *      'eleven'
 *  );
 * 
 * @param string $input
 * @param string $delimiter = ,
 * @param string $open_tag = \(
 * @param string $close_tag = \)
 * @return array
 */
function exploder($input, $delimiter = ',', $open_tag = '\(', $close_tag = '\)')
{
    // this will match any text inside parenthesis
    // including parenthesis itself and without nested parenthesis
    $regexp = '/'.$open_tag.'[^'.$open_tag.$close_tag.']*'.$close_tag.'/';

    // put in placeholders like {{\d}}. They can be nested.
    $r = array();
    while (preg_match_all($regexp, $input, $matches)) {
        if ($matches[0]) {
            foreach ($matches[0] as $match) {
                $r[] = $match;
                $input = str_replace($match, '{{'.count($r).'}}', $input);
            }
        } else {
            break;
        }
    }
    $output = array_map('trim', explode($delimiter, $input));

    // put everything back
    foreach ($output as &$a) {
        while (preg_match('/{{(\d+)}}/', $a, $matches)) {
            $a = str_replace($matches[0], $r[$matches[1] - 1], $a);
        }
    }

    return $output;
}

$a = "one, two three, four (five, (six, seven), (eight)) (nine, ten), eleven";
var_dump(exploder($a));

This will output:

array (size=4)
  0 => string 'one' (length=3)
  1 => string 'two three' (length=9)
  2 => string 'four (five, (six, seven), (eight)) (nine, ten)' (length=46)
  3 => &string 'eleven' (length=6)

as expected.

Upvotes: 1

disjunction
disjunction

Reputation: 656

But there is a solution for your crazy wish ;)

$a = "(Z) X, (Y, W) A, B (one, two), C, D (E,F,G) H, I J";
$reg = '/[^(,]*(?:\([^)]+\))?[^),]*/';
preg_match_all($reg, $a, $matches);
$result = array_filter($matches[0]);
var_dump($result);

Upvotes: 5

Related Questions