Reputation: 6122
I need to split a string that contains commas. I already found something for strings like (str_getcsv
):
'A', 'B with a comma, eh', 'C'
But my string is like this, e.g. with no enclosing characters for the values:
A, B (one, two), C
I need to explode this and obtain:
array(3) {
[0]=>
string(1) "A"
[1]=>
string(12) "B (one, two)"
[2]=>
string(1) "C"
}
I want to split the string using the commas that are not inside parentheses, because that is the only one case in my situation when explode fails.
Upvotes: 2
Views: 1216
Reputation: 47883
More elegantly than creating an array and then filtering the results, you can use preg_split()
in this one-function one-liner:
Code: (Demo)
$string='A, B (one, two), C';
var_export(preg_split('/(?:\([^)]*\)(*SKIP)(*FAIL))|, /',$string));
Output:
array (
0 => 'A',
1 => 'B (one, two)',
2 => 'C',
)
(*SKIP)(*FAIL)
is a technique that disqualifies substrings before matching.[^)]*
is a faster alternative to .
(dot). *if you have nested bracketed expressions, this pattern will not work ...writing a pattern for that scenario is a bit beyond the scope of this question.Upvotes: 0
Reputation: 3709
This snippet helps me with nested parenthesis. Basically idea is to kind of recursively replace (*) with some identifier until there are no more parenthesis. Then explode string by comma and then put everything back. This is not ideal solution - made it just now in about 30 minutes, but it works :) It definitely can be optimized somehow.
/**
* Explode string by delimiter, but don't explode if delimiter is inside parenthesis.
* This also support nested parenthesis - that's where pure RegExp solutions fails.
*
* For example,
* $input = "one, two three, four (five, (six, seven), (eight)) (nine, ten), eleven";
* $output = array(
* 'one',
* 'two three',
* 'four (five, (six, seven), (eight)) (nine, ten)',
* 'eleven'
* );
*
* @param string $input
* @param string $delimiter = ,
* @param string $open_tag = \(
* @param string $close_tag = \)
* @return array
*/
function exploder($input, $delimiter = ',', $open_tag = '\(', $close_tag = '\)')
{
// this will match any text inside parenthesis
// including parenthesis itself and without nested parenthesis
$regexp = '/'.$open_tag.'[^'.$open_tag.$close_tag.']*'.$close_tag.'/';
// put in placeholders like {{\d}}. They can be nested.
$r = array();
while (preg_match_all($regexp, $input, $matches)) {
if ($matches[0]) {
foreach ($matches[0] as $match) {
$r[] = $match;
$input = str_replace($match, '{{'.count($r).'}}', $input);
}
} else {
break;
}
}
$output = array_map('trim', explode($delimiter, $input));
// put everything back
foreach ($output as &$a) {
while (preg_match('/{{(\d+)}}/', $a, $matches)) {
$a = str_replace($matches[0], $r[$matches[1] - 1], $a);
}
}
return $output;
}
$a = "one, two three, four (five, (six, seven), (eight)) (nine, ten), eleven";
var_dump(exploder($a));
This will output:
array (size=4)
0 => string 'one' (length=3)
1 => string 'two three' (length=9)
2 => string 'four (five, (six, seven), (eight)) (nine, ten)' (length=46)
3 => &string 'eleven' (length=6)
as expected.
Upvotes: 1
Reputation: 656
But there is a solution for your crazy wish ;)
$a = "(Z) X, (Y, W) A, B (one, two), C, D (E,F,G) H, I J";
$reg = '/[^(,]*(?:\([^)]+\))?[^),]*/';
preg_match_all($reg, $a, $matches);
$result = array_filter($matches[0]);
var_dump($result);
Upvotes: 5