Reputation: 7438
I've been trying to extract something inside a string. I got the follwing string :
*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george
I want to split by commas outside parentheses and it is suppose to give this:
[
"*",
"bob",
"DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd'))",
"george"
]
I've been trying to use explode but it cut even inside ( and ) ... logic by the function mean.
So I've did this : [^(,\s]+|\([^)]+\)
but it give cut even if a commas is found inside bracket.
Anyone know how to do what I mean?
EDIT :
Ok to be very clear and direct.
I got this : SELECT MyField, Field2, Blabla, Function(param), etc FROM table Blabla
I got the string MyField, Field2, Blabla, Function(param), etc
already because the query is done by multiple function class like $DB->Select('MyField, Field2, Blabla, Function(param), etc');
but now I want to parse everything between commas so MyField, Field2, Blabla, Function(param), etc
become this :
Upvotes: 1
Views: 160
Reputation: 454
I'm not really sure about what you want to do here.. But if you just want to extract strings. You can just use implode.
$array = array("*", "bob", "DATE('gdfgfd', 'Fdsfds', '(\"fdsfdfsd\"))", "george");
echo $test = implode($array, ",");
Upvotes: -1
Reputation: 784998
You can use this regex based code to get the split result the way you want:
$str = "*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george";
$arr = preg_split('/([^,]*(?:\([^)]*\))[^,]*)+|,/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
Though my original answer worked for the example that OP posted but due the concerns raised by some members I am posting a solution that will work with nested parenthesis as well as long as brackets are balanced:
$str = "*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]," .
"john, MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd')), george";
$arr = preg_split('/\s*( [^,()]* \( ( [^()]* | (?R) )* \) [^,()]* ) ,?\s* | \s*,\s*/x',
$str, -1 , PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);
OUTPUT:
Array
(
[0] => *
[1] => bob
[2] => DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]
[3] => john
[4] => MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd'))
[5] => george
)
Caution: Even though this recursion based regex pattern works with deep nested brackets now it doesn't mean that this cannot be broken for some edge case situations (like unbalanced brackets).
Upvotes: 0
Reputation: 140210
Here's what I cooked up, doesn't support multibyte characters:
Edit: added string awareness
<?php
$stack = array();
$stuff = array();
$escaping = false;
$input = "*, bob, [], DATE('g()d\\'f,gfd', ('Fd()sf)ds'), ('fdsfd\"\"()fsd')), ',(),() (,,'";
$len = strlen( $input );
$i = 0;
$curstr = "";
$char;
while( $i < $len ) {
$char = $input[$i++];
if( $escaping ) {
$curstr .= $char;
$escaping = false;
continue;
}
switch( $char ) {
case "\\":
$escaping = true;
break;
case '"':
$top = end( $stack );
if( $top === '"' ) {
array_pop( $stack );
}
else if( $top !== "'" ){
$stack[] = '"';
}
$curstr .= $char;
break;
case "'":
$top = end( $stack );
if( $top === "'" ) {
array_pop( $stack );
}
else if( $top !== '"' ) {
$stack[] = "'";
}
$curstr .= $char;
break;
case ",":
if( count( $stack ) ) {
$curstr .= $char;
}
else {
$stuff[] = trim($curstr);
$curstr = "";
}
break;
case "(":
$top = end( $stack );
if( $top !== "'" && $top !== '"' ) {
$stack[] = "(";
}
$curstr .= $char;
break;
case ")":
$top = end( $stack );
if( $top !== "'" && $top !== '"' ) {
if( end($stack) !== "(" ) {
die( "Unbalanced parentheses" );
}
array_pop( $stack );
}
$curstr .= $char;
break;
default:
$curstr .= $char;
break;
}
}
if( count( $stack ) ) {
die( "Unbalanced ".end($stack) );
}
$stuff[] = trim( $curstr );
print_r( $stuff );
/*
Array
(
[0] => *
[1] => bob
[2] => []
[3] => DATE('g()d'f,gfd', ('Fd()sf)ds'), ('fdsfd""()fsd'))
[4] => ',(),() (,,'
)
*/
Upvotes: 2
Reputation: 2661
This will work (for the most part). It will fail if you have brackets within quotes (part of the data). You can extend the code to handle quoted brackets if you want (but then you have to consider escaped quotes and everything like that. A regular expression will never work well.
Edit: Better to use the PHP SQL Parser as answered by SpikeX.
function unreliable_comma_explode($str)
{
$last_split = 0;
$len = strlen($str);
$brackets = 0;
$parts = array();
for ($i = 0; $i < $len; $i++)
{
if ($str[$i] == '(')
{
$brackets++;
continue;
}
if ($str[$i] == ')')
{
if (--$brackets == -1) $brackets = 0;
continue;
}
if ($str[$i] == ',' && $brackets == 0)
{
$parts[] = substr($str, $last_split, ($i-$last_split));
$last_split = $i + 1;
}
}
if (($len-$last_split) > 0)
$parts[] = substr($str, $last_split, ($len-$last_split));
return $parts;
}
Upvotes: 0
Reputation: 17119
Posting this as an answer since it's probably better than anything else:
http://code.google.com/p/php-sql-parser/
Use that project to parse your SQL statements. The results come back as an array, including the bits in between SELECT
and FROM
as individual elements, just as you want. This will work far better than any regular expression solution you use.
Upvotes: 4
Reputation: 8486
You stated in your comments that you're prepared to use recursion because you have nested lists. However, regex cannot do recursion. This is because regex cannot "count" anything indefinitely. Since it has no way of counting open/close parenthesis, it can't know how many levels in it is, or how many levels out it must go.
You can write horrendously complex regex to handle N levels of depth (see anubhava's answer), but as soon as you run across an expression with N+1 levels of depth your regex will fail. This is why we use programming languages to parse irregular languages because they can count recursion (see diolemo's answer). Within this recursion, we can use small bits of regex.
Upvotes: 0