Reputation: 423
I have something like this
IDENTIFIER = Word(alphas + '_', alphanums + '_') #words
GENERIC_TYPE = Regex('[a-zA-Z_]+[a-zA-Z0-9_]*(\<[a-zA-Z0-9_]+\>)?') #List<string> or int
AMF = Keyword('public') | Keyword('private') | Keyword('protected') #method modifier
SFMF = Optional(Keyword('static')) & Optional(Keyword('final')) #static and final modifiers
For this example:
res = (Optional(AMF) +
SFMF +
IDENTIFIER).parseString('Method')
print(res)
it prints: ['Method']
but if I add Optional(GENERIC_TYPE)
:
res = (Optional(AMF) +
SFMF +
Optional(GENERIC_TYPE) +
IDENTIFIER).parseString(text)
print(res)
it prints ['int', 'Method']
for text='int Method'
BUT raises an exception for 'final Method'
(or just 'Method'
):
pyparsing.ParseException: Expected W:(abcd...,abcd...) (at char 12), (line:1, col:13)
It looks like pyparsing don't see the Optional thing because if GENERIC_TYPE is optional (like a lot of stuff before it) it should go further and parse IDENTIFIER part.
UPDATE:
The problem seems to be in the logic of parsing. If there is two equal patterns and one them is Optional then the parser don't check if it's about the second one. For example:
m = Optional('M') + Literal('M')
m.parseString('M')
The parser matches 'M' to the first part and then misses Non-optional Literal part.
So the question now is can I parse it so that it matches to the second one. It could be not on the end of the string or line so I can't use that.
Upvotes: 1
Views: 959
Reputation: 63782
I would say, "GENERIC_TYPEs have to be followed by an IDENTIFIER". So to clear up the issue with your grammar, rewrite res
as:
res = (Optional(AMF) +
SFMF +
Optional(GENERIC_TYPE + FollowedBy(IDENTIFIER)) +
IDENTIFIER).parseString(text)
You could also write this as:
res = (Optional(AMF) +
SFMF +
(GENERIC_TYPE + IDENTIFIER | IDENTIFIER)).parseString(text)
Pyparsing does not do any lookahead like a regex will do, you have to include it in your grammar definition explicitly.
Also, since IDENTIFIER will match any string of characters, you might want to define an expression like 'keyword' that matches all the language keywords, and then define IDENTIFIER as:
keyword = MatchFirst(map(Keyword,"public private protected static final".split()))
IDENTIFIER = ~keyword + Word(alphas + '_', alphanums + '_')
Finally, you might want GENERIC_TYPE to handle more than just simple container<type>
definitions, like Map<String,String>
, Map<String,List<String>>
or even Map<String,Map<String,Map<String,Map<String,Map<String,String>>>>>
.
This will parse all of those:
GENERIC_TYPE = Group(IDENTIFIER + nestedExpr('<', '>', content=delimitedList(IDENTIFIER)))
Upvotes: 2