David Bélanger
David Bélanger

Reputation: 7438

Parse and split comma-separated fields and expression from a SQL SELECT clause

I've been trying to extract something inside a string. I got the follwing string :

*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george

I want to split by commas outside parentheses and it is suppose to give this:

[
    "*",
    "bob",
    "DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd'))",
    "george"
]

I've been trying to use explode but it cut even inside ( and ) ... logic by the function mean.

So I've did this : [^(,\s]+|\([^)]+\) but it give cut even if a commas is found inside bracket.

Anyone know how to do what I mean?

EDIT :

Ok to be very clear and direct.

I got this : SELECT MyField, Field2, Blabla, Function(param), etc FROM table Blabla

I got the string MyField, Field2, Blabla, Function(param), etc already because the query is done by multiple function class like $DB->Select('MyField, Field2, Blabla, Function(param), etc'); but now I want to parse everything between commas so MyField, Field2, Blabla, Function(param), etc become this :

Upvotes: 1

Views: 160

Answers (6)

Michael Roewin Tan
Michael Roewin Tan

Reputation: 454

I'm not really sure about what you want to do here.. But if you just want to extract strings. You can just use implode.

$array = array("*", "bob", "DATE('gdfgfd', 'Fdsfds', '(\"fdsfdfsd\"))", "george");
echo $test = implode($array, ",");

Upvotes: -1

anubhava
anubhava

Reputation: 784998

You can use this regex based code to get the split result the way you want:

$str = "*, bob, DATE('gdfgfd', 'Fdsfds', ('fdsfdfsd')), george";
$arr = preg_split('/([^,]*(?:\([^)]*\))[^,]*)+|,/', $str, -1,
                      PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

Update:

Though my original answer worked for the example that OP posted but due the concerns raised by some members I am posting a solution that will work with nested parenthesis as well as long as brackets are balanced:

$str = "*, bob, DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]," .
       "john, MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd')), george";
$arr = preg_split('/\s*( [^,()]* \( ( [^()]* | (?R) )* \) [^,()]* ) ,?\s* | \s*,\s*/x',
                  $str, -1 , PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);

OUTPUT:

Array
(
    [0] => *
    [1] => bob
    [2] => DATE('gdfgfd', ('Fdsfds'), ('fdsfdfsd', ('foo'))) 'foo'=[bar]
    [3] => john
    [4] => MY('gdfgfd', ((('Fdsfds'))), ('fdsfdfsd'))
    [5] => george
)

Caution: Even though this recursion based regex pattern works with deep nested brackets now it doesn't mean that this cannot be broken for some edge case situations (like unbalanced brackets).

Upvotes: 0

Esailija
Esailija

Reputation: 140210

Here's what I cooked up, doesn't support multibyte characters:

Edit: added string awareness

<?php


$stack = array();
$stuff = array();

$escaping = false;
$input = "*, bob, [], DATE('g()d\\'f,gfd', ('Fd()sf)ds'), ('fdsfd\"\"()fsd')), ',(),() (,,'";
$len = strlen( $input );
$i = 0;
$curstr = "";
$char;

while( $i < $len ) {
    $char = $input[$i++];

    if( $escaping ) {
        $curstr .= $char;
        $escaping = false;
        continue;
    }

    switch( $char ) {

        case "\\":
            $escaping = true;
            break;

        case '"':
            $top = end( $stack );
            if( $top === '"' ) {
                array_pop( $stack );
            }
            else if( $top !== "'" ){
                $stack[] = '"';
            }

            $curstr .= $char;
            break;

        case "'":
            $top = end( $stack );
            if( $top === "'" ) {
                array_pop( $stack );
            }
            else if( $top !== '"' ) {
                $stack[] = "'";
            }

            $curstr .= $char;           
            break;

        case ",":
            if( count( $stack ) ) {
                $curstr .= $char;
            }
            else {
                $stuff[] = trim($curstr);
                $curstr = "";
            }
            break;

        case "(":
            $top = end( $stack );
            if( $top !== "'" && $top !== '"' ) {
                $stack[] = "(";                   
            }

            $curstr .= $char;
            break;

        case ")":
            $top = end( $stack );

            if( $top !== "'" && $top !== '"' ) {
                if( end($stack) !== "(" ) {
                    die( "Unbalanced parentheses" );
                }
                array_pop( $stack );
            }

            $curstr .= $char;


            break;

        default:
            $curstr .= $char;
            break;

    }
}

if( count( $stack ) ) {
    die( "Unbalanced ".end($stack) );
}

$stuff[] = trim( $curstr );

print_r( $stuff );

/*
    Array
(
    [0] => *
    [1] => bob
    [2] => []
    [3] => DATE('g()d'f,gfd', ('Fd()sf)ds'), ('fdsfd""()fsd'))
    [4] => ',(),() (,,'
)

*/

Upvotes: 2

diolemo
diolemo

Reputation: 2661

This will work (for the most part). It will fail if you have brackets within quotes (part of the data). You can extend the code to handle quoted brackets if you want (but then you have to consider escaped quotes and everything like that. A regular expression will never work well.

Edit: Better to use the PHP SQL Parser as answered by SpikeX.

function unreliable_comma_explode($str)
{
   $last_split = 0;
   $len = strlen($str);
   $brackets = 0;
   $parts = array();

   for ($i = 0; $i < $len; $i++)
   {
      if ($str[$i] == '(') 
      {
         $brackets++;
         continue;
      }

      if ($str[$i] == ')')
      {
         if (--$brackets == -1) $brackets = 0;
         continue;
      }

      if ($str[$i] == ',' && $brackets == 0)
      {
         $parts[] = substr($str, $last_split, ($i-$last_split));
         $last_split = $i + 1;
      }
   }

   if (($len-$last_split) > 0)
      $parts[] = substr($str, $last_split, ($len-$last_split));

   return $parts;
}

Upvotes: 0

qJake
qJake

Reputation: 17119

Posting this as an answer since it's probably better than anything else:

http://code.google.com/p/php-sql-parser/

Use that project to parse your SQL statements. The results come back as an array, including the bits in between SELECT and FROM as individual elements, just as you want. This will work far better than any regular expression solution you use.

Upvotes: 4

dlras2
dlras2

Reputation: 8486

You stated in your comments that you're prepared to use recursion because you have nested lists. However, regex cannot do recursion. This is because regex cannot "count" anything indefinitely. Since it has no way of counting open/close parenthesis, it can't know how many levels in it is, or how many levels out it must go.

You can write horrendously complex regex to handle N levels of depth (see anubhava's answer), but as soon as you run across an expression with N+1 levels of depth your regex will fail. This is why we use programming languages to parse irregular languages because they can count recursion (see diolemo's answer). Within this recursion, we can use small bits of regex.

Upvotes: 0

Related Questions