Reputation: 2557
Using Python. In my grammar, I have a line like this:
ipv6_comp: [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]
My transformer has the appropriate function
def ipv6_comp(self, args):
However, args
looks like this:
<class 'list'>: ['2001', 'db8', '85a3', '8a2e', '370', '7334']
Because the literalls are not included. However, from this structure, its obviously impossible for me to know whether the original ip looks like either of these:
2001:db8:85a3::8a2e:370:7334
2001:db8:85a3:8a2e::370:7334
I think I could just mask the literals with their own rules like
colon: ":"
doublecolon: "::"
ipv6_comp: [ipv6_hex (colon ipv6_hex)~0..5] doublecolon [ipv6_hex (colon ipv6_hex)~0..5]
which might even be cleaner. However, the grammar I use is semi-automatically generated and this would take more manual labor.
Is there a way for my transformer function ipv6_comp
to also include literals in the args
parameter?
Upvotes: 2
Views: 681
Reputation: 1430
There are two approach to solve your problem.
ipv6_comp
into a terminal. Then Lark will match it all in a single regexp, and return all its matched characters: IPV6_COMP: [HEX (":" HEX)~0..5] "::" [HEX (":" HEX)~0..5]
Provide a name for your punctuation (what you suggested, but as terminals)
COLON: ":"
Use the !
operator to include punctuation (that is: unnamed symbols) in the rule
!ipv6_comp: [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]
I recommend the first solution, because it is faster to parse, and you can use a dedicated library to parse the IPv6 address into components, after the parse is done.
Upvotes: 3