Wald
Wald

Reputation: 41

Regular expression for nested brackets that contain a symbol

I need to replace [with square brackets] only those parentheses that contain comma, no matter on which nesting level they are.

Example of a raw string:

start (one, two, three(*)), some text (1,2,3), and (4, 5(*)), another
(four), interesting (five (6, 7)), text (six($)), here is (seven)

Expected result:

start [one, two, three(*)], some text [1,2,3], and [4, 5(*)], another
(four), interesting (five [6, 7]), text (six($)), here is (seven)

The best I could do doesn't cope with parts with nested parentheses:

preg_replace('~ \( ( [^()]+ (\([^,]+\))? , [^()]+ )+ \) ~x', ' [$1]', $string);

// start (one, two, three(*)), some text [1,2,3], and (4, 5(*)), another (four), interesting (five [6, 7]), text (six($)), here is (seven)

Upvotes: 0

Views: 276

Answers (2)

Ok, this is not regular expression, but, in case you don't find a regular expression, next alghoritm is your plan B, plenty of comments (it might be useful for someone, and that's what StackOverflow is for) :

$str = "start (one, two, three(\*)), some text (1,2,3), and (4, 5(*)), another " .
       "(four), interesting (five (6, 7)), text (six($)), here is (seven)";
echo $str . "<br/>";

$PARs = array(); // ◄■ POSITIONS OF "(" AND COMMAS.
for ( $i = 0; $i < strlen( $str ); $i++ )
  switch ( $str[ $i ] )
  {
     case "(" : array_push( $PARs, array($i,false) ); // ◄■ POSITION OF "(" (NO COMMA YET).
                break;
     case ")" : $POS = array_pop( $PARs ); // ◄■ [POSITION OF PREVIOUS "("][THERE'S COMMA]
                if ( $POS[1] ) // ◄■ IF THERE WAS COMMA IN CURRENT "()"...
                     {
                       $str[ $POS[0] ] = "["; // ◄■ REPLACE "(".
                       $str[ $i ] = "]"; // ◄■ REPLACE ")".
                     }
                break;
     case "," : if ( ! empty( $PARs ) ) // ◄■ IGNORE COMMAS IF NOT IN "()".
                   $PARs[ count($PARs) - 1 ][1] = true; // COMMA FOUND.
  }

echo $str . // ◄■ RESULT.
     // COMPARE WITH EXPECTED ▼
     "<br/>start [one, two, three(\*)], some text [1,2,3], and [4, 5(*)], another " .
     "(four), interesting (five [6, 7]), text (six($)), here is (seven)";

Edit : fixed bug found by @trincot (thanks).

Upvotes: 0

trincot
trincot

Reputation: 350750

I would tokenise the input, splitting it by commas and parentheses, keeping also these delimiters as results. Then use a recursive algorithm to detect whether commas appear for a certain pair of parentheses and make the appropriate replacement.

Here is a function doing the job:

function replaceWithBrackets($s) {

    function recur(&$tokens) {
        $comma = false;
        $replaced = "";
        while (true) {
            $token = current($tokens);
            next($tokens);
            if ($token == ")" || $token === false) break; 
            if ($token == "(") {
                [$substr, $subcomma] = recur($tokens);
                $replaced .= $subcomma ? "[$substr]" : "($substr)";
            } else {
                $comma = $comma || $token == ",";
                $replaced .= $token;
            }
        }
        return [$replaced, $comma];
    }
    
    $tokens = preg_split("~([(),])~", $s, 0, PREG_SPLIT_DELIM_CAPTURE);
    return recur($tokens)[0];
}

Upvotes: 2

Related Questions