ghostmansd
ghostmansd

Reputation: 3465

Regex: one argument and several arguments

Could you explain, please, how can I make regex that will match (arg1), (arg1, arg2), (arg1, arg2, xarg, zarg), etc. Every name is an ASCII string which always starts with symbol [A-Za-z]. Here is what I've tried: "("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")". Thanks!

Note: Regex must work in flex

Upvotes: 0

Views: 103

Answers (2)

rici
rici

Reputation: 241721

I'm not sure that flex is the right tool here, since you would usually use it to separate inputs like that into separate tokens. However, it's certainly possible:

"("[[:alpha:]][[:alnum:]]*(,[[:alpha:]][[:alnum:]]*)*")"

That will match (arg1) (arg1,arg2) but it won't match ( arg1 ) or (arg1, arg2). If you want to ignore spaces everywhere it gets a bit wordier.

This sort of thing is a lot more readable if you use lex definitions:

ID      [[:alpha:]][[:alnum:]]*

%%

"("{ID}(","{ID})*")"

or, with space matching:

/* Make sure you're in the C locale when you compile. Or adjust
 * the definition accordingly. Perhaps you wanted to allow other 
 * characters in IDs.
 */
ID      [[:alpha:]][[:alnum:]]*
/* OWS = Optional White Space.*/
/* Flex defines blank as "space or tab" */
OWS     [[:blank:]]*
COMMA   {OWS}","{OWS}
OPEN    "("{OWS}
CLOSE   {OWS}")"

%%

{OPEN}{ID}({COMMA}{ID})*{CLOSE}  { /* Got a parenthesized list of ids */

Final note: This won't match () either; there has to be at least one id. If you want to include it as well you can make the part between the parentheses optional:

{OPEN}({ID}({COMMA}{ID})*)?{CLOSE}  { /* Got a parenthesized        */
                                      /* possibly empty list of ids */

Upvotes: 1

wim
wim

Reputation: 362687

Something like that?

>>> import re
>>> s = '''Could you explain, please, how can I make regex that will match (arg1), (arg1, arg2), (arg1, arg2, xarg, zarg), etc. Every name is an ASCII string which always starts with symbol [A-Za-z]. Here is what I've tried: "("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")". Thanks!'''
>>> re.findall(r'\([A-Za-z]?arg[0-9]?(?:, [A-Za-z]?arg[0-9]?)*\)', s)
['(arg1)', '(arg1, arg2)', '(arg1, arg2, xarg, zarg)']

Upvotes: 1

Related Questions