Benjamin Basmaci
Benjamin Basmaci

Reputation: 2557

Lark: How to make literals appear in the tree

Using Python. In my grammar, I have a line like this:

ipv6_comp: [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]

My transformer has the appropriate function

def ipv6_comp(self, args):

However, args looks like this:

<class 'list'>: ['2001', 'db8', '85a3', '8a2e', '370', '7334']

Because the literalls are not included. However, from this structure, its obviously impossible for me to know whether the original ip looks like either of these:

2001:db8:85a3::8a2e:370:7334
2001:db8:85a3:8a2e::370:7334

I think I could just mask the literals with their own rules like

colon: ":"
doublecolon: "::"
ipv6_comp: [ipv6_hex (colon ipv6_hex)~0..5] doublecolon [ipv6_hex (colon ipv6_hex)~0..5]

which might even be cleaner. However, the grammar I use is semi-automatically generated and this would take more manual labor.

Is there a way for my transformer function ipv6_comp to also include literals in the args parameter?

Upvotes: 2

Views: 681

Answers (1)

Erez
Erez

Reputation: 1430

There are two approach to solve your problem.

  1. Turn ipv6_comp into a terminal. Then Lark will match it all in a single regexp, and return all its matched characters:
    IPV6_COMP: [HEX (":" HEX)~0..5] "::" [HEX (":" HEX)~0..5]
  1. Provide a name for your punctuation (what you suggested, but as terminals)

    COLON: ":"

  2. Use the ! operator to include punctuation (that is: unnamed symbols) in the rule

    !ipv6_comp: [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]

I recommend the first solution, because it is faster to parse, and you can use a dedicated library to parse the IPv6 address into components, after the parse is done.

Upvotes: 3

Related Questions