While True
While True

Reputation: 423

Pyparsing with Optional at left

I have something like this

IDENTIFIER = Word(alphas + '_', alphanums + '_') #words
GENERIC_TYPE = Regex('[a-zA-Z_]+[a-zA-Z0-9_]*(\<[a-zA-Z0-9_]+\>)?') #List<string> or int
AMF = Keyword('public') | Keyword('private') | Keyword('protected') #method modifier
SFMF = Optional(Keyword('static')) & Optional(Keyword('final')) #static and final modifiers

For this example:

res = (Optional(AMF) + 
       SFMF + 
       IDENTIFIER).parseString('Method')
print(res)

it prints: ['Method'] but if I add Optional(GENERIC_TYPE):

res = (Optional(AMF) +
       SFMF +
       Optional(GENERIC_TYPE) +
       IDENTIFIER).parseString(text)
print(res)

it prints ['int', 'Method'] for text='int Method' BUT raises an exception for 'final Method' (or just 'Method'):

pyparsing.ParseException: Expected W:(abcd...,abcd...) (at char 12), (line:1, col:13)

It looks like pyparsing don't see the Optional thing because if GENERIC_TYPE is optional (like a lot of stuff before it) it should go further and parse IDENTIFIER part.

UPDATE:

The problem seems to be in the logic of parsing. If there is two equal patterns and one them is Optional then the parser don't check if it's about the second one. For example:

m = Optional('M') + Literal('M')
m.parseString('M')

The parser matches 'M' to the first part and then misses Non-optional Literal part.

So the question now is can I parse it so that it matches to the second one. It could be not on the end of the string or line so I can't use that.

Upvotes: 1

Views: 959

Answers (1)

PaulMcG
PaulMcG

Reputation: 63782

I would say, "GENERIC_TYPEs have to be followed by an IDENTIFIER". So to clear up the issue with your grammar, rewrite res as:

res = (Optional(AMF) +
       SFMF +
       Optional(GENERIC_TYPE + FollowedBy(IDENTIFIER)) +
       IDENTIFIER).parseString(text)

You could also write this as:

res = (Optional(AMF) +
       SFMF +
       (GENERIC_TYPE + IDENTIFIER | IDENTIFIER)).parseString(text)

Pyparsing does not do any lookahead like a regex will do, you have to include it in your grammar definition explicitly.

Also, since IDENTIFIER will match any string of characters, you might want to define an expression like 'keyword' that matches all the language keywords, and then define IDENTIFIER as:

keyword = MatchFirst(map(Keyword,"public private protected static final".split()))
IDENTIFIER = ~keyword + Word(alphas + '_', alphanums + '_')

Finally, you might want GENERIC_TYPE to handle more than just simple container<type> definitions, like Map<String,String>, Map<String,List<String>> or even Map<String,Map<String,Map<String,Map<String,Map<String,String>>>>>.

This will parse all of those:

GENERIC_TYPE = Group(IDENTIFIER + nestedExpr('<', '>', content=delimitedList(IDENTIFIER)))

Upvotes: 2

Related Questions