Reputation: 3465
Could you explain, please, how can I make regex that will match (arg1)
, (arg1, arg2)
, (arg1, arg2, xarg, zarg)
, etc. Every name is an ASCII string which always starts with symbol [A-Za-z]
. Here is what I've tried: "("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")"
. Thanks!
Note: Regex must work in flex
Upvotes: 0
Views: 103
Reputation: 241721
I'm not sure that flex is the right tool here, since you would usually use it to separate inputs like that into separate tokens. However, it's certainly possible:
"("[[:alpha:]][[:alnum:]]*(,[[:alpha:]][[:alnum:]]*)*")"
That will match (arg1)
(arg1,arg2)
but it won't match ( arg1 )
or (arg1, arg2)
. If you want to ignore spaces everywhere it gets a bit wordier.
This sort of thing is a lot more readable if you use lex definitions:
ID [[:alpha:]][[:alnum:]]*
%%
"("{ID}(","{ID})*")"
or, with space matching:
/* Make sure you're in the C locale when you compile. Or adjust
* the definition accordingly. Perhaps you wanted to allow other
* characters in IDs.
*/
ID [[:alpha:]][[:alnum:]]*
/* OWS = Optional White Space.*/
/* Flex defines blank as "space or tab" */
OWS [[:blank:]]*
COMMA {OWS}","{OWS}
OPEN "("{OWS}
CLOSE {OWS}")"
%%
{OPEN}{ID}({COMMA}{ID})*{CLOSE} { /* Got a parenthesized list of ids */
Final note: This won't match ()
either; there has to be at least one id. If you want to include it as well you can make the part between the parentheses optional:
{OPEN}({ID}({COMMA}{ID})*)?{CLOSE} { /* Got a parenthesized */
/* possibly empty list of ids */
Upvotes: 1
Reputation: 362687
Something like that?
>>> import re
>>> s = '''Could you explain, please, how can I make regex that will match (arg1), (arg1, arg2), (arg1, arg2, xarg, zarg), etc. Every name is an ASCII string which always starts with symbol [A-Za-z]. Here is what I've tried: "("[A-Za-z][a-z0-9]*(,)?([A-Za-z][a-z0-9]*)?")". Thanks!'''
>>> re.findall(r'\([A-Za-z]?arg[0-9]?(?:, [A-Za-z]?arg[0-9]?)*\)', s)
['(arg1)', '(arg1, arg2)', '(arg1, arg2, xarg, zarg)']
Upvotes: 1