senseiwa
senseiwa

Reputation: 2499

Regexp for python-like function parameters parsing

I am trying to make use questions like this one to devise a regexp that will match and give a function name and all parameters in a very simplified Python-like syntax like the following:

mycall(x, y, hello)

with the desired results:

Of course it should also match noparams(), and any number of parameters. As for my simplifications, I just need parameters names, I don't allow default parameters or something different from a list of comma separated names.

My tries with variants of "(\\s*)([A-Za-z0-9_])+\\(\\)" just to match a function name string with spaces at the beginning are failing, with this code:

    std::regex fnregexp(s);

    std::smatch pieces_match;

    if (std::regex_match(q, pieces_match, fnregexp))
    {
        std::cout << ">>>> '" << q << "'" << std::endl;

        for (size_t i = 0; i < pieces_match.size(); ++i)
        {
            std::ssub_match sub_match = pieces_match[i];
            std::string piece = sub_match.str();
            std::cout << "  submatch " << i << ": '" << piece << "'" << std::endl;
        }
    }

I have the following output for " hello()":

>>>> '     hello()'
  submatch 0: '     hello()'
  submatch 1: '     '
  submatch 2: 'o'

With this very basic syntax, is it possible to find name of the function and its parameters?

Cheers!

Upvotes: 0

Views: 129

Answers (2)

logi-kal
logi-kal

Reputation: 7880

Use this for the conformance check:

^\\s*[A-Za-z_]\\w* *\\( *(?:[A-Za-z_]\\w* *(?:, *[A-Za-z_]\\w* *)*)?\\)$

and if it's ok use this for extracting the parts of signature:

\\w+

the first submatch is the function name, the others are parameters.

EDIT: The correct synthax for Python is [A-Za-z_][A-Za-z0-9_]*

Upvotes: 1

besc
besc

Reputation: 2647

Matching simple function declarations with regex is feasable. For more complicated things you have exactly the right idea in going with a real parser like Boost Spirit.

The bug in your question is a wrong closing parens in the regex. Compare:

"(\\s*)([A-Za-z0-9_])+\\(\\)" // yours
"(\\s*)([A-Za-z0-9_]+)\\(\\)" // correct

The capture group in your version captures only a single character. Because of how the regex engine works it is the last one matched: the o. The correct version includes the + in the group and captures hello as expected.

Upvotes: 1

Related Questions