SPlatten
SPlatten

Reputation: 5750

C++ regular expression to split string into an array

I am trying to write a handler to extract parameters from a function, where the parameters are between () and the parameters will be delimited by a command ',' parameters may also be defined as arrays which are comma delimited and wrapped in [].

Examples of what I'm trying to decode:

    testA(aaaa, [bbbb,cccc,dddd], eeee)

or

    testB([aaaa,bbbb,cccc], dddd, [eeee,ffff])

Basically any combination and any number of parameters, what I want from these would be a list containing:

for testA:

    0 : aaaa
    1 : [bbbb,cccc,dddd]
    2 : eeee

for testB:

    0 : [aaaa,bbbb,cccc]
    1 : dddd
    2 : [eeee,ffff]

I'm trying to write a parser that will give me the same, but a regular expression to do this would be preferred.

This is my coded solution which works written in C++ for Qt5.6:

    int intOpSB, intPStart;
    //Analyse and count the parameters
    intOpSB = intPStart = 0;
    for( int p=0; p<strParameters.length(); p++ ) {
        const QChar qc = strParameters.at(p);

        if ( qc == clsXMLnode::mcucOpenSquareBracket ) {
            intOpSB++;
            continue;
        } else if ( qc == clsXMLnode::mcucCloseSquareBracket ) {
            intOpSB--;
            continue;
        }
        if ( (intOpSB == 0 && qc == clsXMLnode::mcucArrayDelimiter)
        || p == strParameters.length() - 1 ) {
            if ( strParameters.at(intPStart) == clsXMLnode::mcucArrayDelimiter ) {
    //Skip over the opening bracket or array delimiter
                intPStart++;
            }
            if ( intPStart > p ) {
                continue;
            }
            int intEnd = p;
            while( true ) {
                if ( intEnd > 0 && (strParameters.at(intEnd) == clsXMLnode::mcucArrayDelimiter) ) {
    //We don't want the delimiter or the closing square bracket in the parameter
                    intEnd--;
                } else {
                    break;
                }
            }
            if ( intEnd > intPStart ) {
                QString strParameter = strParameters.mid(intPStart, intEnd - intPStart + 1);
    //Update remaining parameters, skipping the parameter and any delimiter
                strParameters = strParameters.mid(strParameter.length() + 1);
    //Remove any quotes
                strParameter = strParameter.replace("\"", "");
                strParameter = strParameter.replace("\'", "");
    //Add the parameter
                mslstParameters.append(strParameter);
    //Reset parameter start
                intPStart = 0;
                p = -1;
            }
        }
    }

References:

    mcucOpenSquareBracket is a constant defined as '['
    mcucCloseSquareBracket is a constant defined as ']'
    mcucArrayDelimiter is a constant defined as ','
    mslstParameters is a member defined as QStringList

Upvotes: 0

Views: 325

Answers (2)

Yakk - Adam Nevraumont
Yakk - Adam Nevraumont

Reputation: 275310

auto term = "(?:[^,<]*)"s;
auto chain = "(?:(?:"+term+",)*"+term+")"s;

auto clause = "(?:(?:"+term+")|(?:<" + chain + ">))"s;

auto re_str = "^(?:("+term+")|(?:<("+chain+")>))" "(?:|,((?:"+clause+",)*"+clause+"))";

re_str takes your string, and splits off the first term or chain from the tail.

It returns up to 3 sub-matches. The first is a lone term. The second is a comma-delimited chain of terms. The third is the rest of the string after the ,.

The tail is going to be empty, or another string that can be parsed using the above regular expression.

Chains of terms can be parsed by the same regular expression.

live example.

I matched <> delimited chains of terms, not [], because I got bored of \\s.

You also want to discard whitespace around clauses. I omitted that, it should be easy to stitch in.

Upvotes: 2

Please Ignore
Please Ignore

Reputation: 1

I have this regex that should work.

\[.*?\]|([^,\s]+)

See here at Regexr

Upvotes: 0

Related Questions