Reputation: 179
I need split string by commas and spaces, but ignore the inside quotes, single quotes and parentheses
$str = "Questions, \"Quote\",'single quote','comma,inside' (inside parentheses) space #specialchar";
so that the resultant array will have
[0]Questions [1]Quote [2]single quote [3]comma,inside [4]inside parentheses [5]space [6]#specialchar
my atual regexp is
$tags = preg_split("/[,\s]*[^\w\s]+[\s]*/", $str,0,PREG_SPLIT_NO_EMPTY);
but this is ignoring special chars and stil split the commas inside quotes, the resultant array is :
[0]Questions [1]Quote [2]single quote [3]comma [4]inside [5]inside parentheses [6]space [7]specialchar
ps: this is no csv
Many Thanks
Upvotes: 5
Views: 5628
Reputation: 75222
Well, this works for the data you supplied:
$rgx = <<<'EOT'
/
[,\s]++
(?=(?:(?:[^"]*+"){2})*+[^"]*+$)
(?=(?:(?:[^']*+'){2})*+[^']*+$)
(?=(?:[^()]*+\([^()]*+\))*+[^()]*+$)
/x
EOT;
The lookaheads assert that if there are any double-quotes, single-quotes or parentheses ahead of the current match position there's an even number of them, and the parens are in balanced pairs (no nesting allowed). That's a quick-and-dirty way to ensure that the current match isn't occurring inside a pair of quotes or parens.
Of course, it assumes the input is well formed. But on the subject of of well-formedness, what about escaped quotes within quotes? What if you have quotes inside parens, or vice-versa? Would this input be legal?
"not a \" quote", 'not a ) quote', (not ",' quotes)
If so, you've got a much more difficult job ahead of you.
Upvotes: 2
Reputation: 4814
This will work only for non-nested parentheses:
$regex = <<<HERE
/ " ( (?:[^"\\\\]++|\\\\.)*+ ) \"
| ' ( (?:[^'\\\\]++|\\\\.)*+ ) \'
| \( ( [^)]* ) \)
| [\s,]+
/x
HERE;
$tags = preg_split($regex, $str, -1,
PREG_SPLIT_NO_EMPTY
| PREG_SPLIT_DELIM_CAPTURE);
The ++
and *+
will consume as much as they can and give nothing back for backtracking. This technique is described in perlre(1) as the most efficient way to do this kind of matching.
Upvotes: 6