Backslash conflict in Words and linebreak in Pyparsing

Question

I'm having a hard time with a grammar that allows '\' in the parameter names (ex. net\<8>). However, '\' also serves as a continuation line (see Ex2.). Ex1 works fine, but there's a conflict between the linebreak and identifier variables.

Ex1: working (netlist.sp)

subckt INVERTER A Z gnd gnds vdd vdds
M1 (Z A vdd vdds) pmos w=0.4 l=0.1
M2 (Z A gnd gnds) nmos w=0.2 l=0.1
ends INVERTER

I1 (net1 net2 0 gnds! vdd! vdds!) INVERTER

subckt INVERTER_2 A Z gnd gnds vdd vdds
M1 (Z A vdd vdds) pmos w=0.4 l=0.1
M2 (Z A gnd gnds) nmos w=0.2 l=0.1
ends INVERTER_2

I2 (net\<8\> net2 0 gnds! vdd! vdds!) INVERTER_2

I3 (net1 net2 0 gnds! vdd! vdds!) INVERTER_2

Ex2: not working (netlist2.sp)

subckt INVERTER A Z gnd gnds vdd vdds
M1 (Z A vdd vdds) pmos w=0.4 l=0.1
M2 (Z A gnd gnds) nmos w=0.2 l=0.1
ends INVERTER

I1 (net1 net2 0 gnds! vdd! vdds!) INVERTER

subckt INVERTER_2 A Z gnd gnds \
                  vdd vdds
M1 (Z A vdd vdds) pmos w=0.4 l=0.1
M2 (Z A gnd gnds) nmos w=0.2 l=0.1
ends INVERTER_2

I2 (net\<8\> net2 0 gnds! vdd! vdds!) INVERTER_2

I3 (net1 net2 0 gnds! vdd! vdds!) INVERTER_2

The code

import pyparsing as pp
import json

EOL = pp.LineEnd().suppress() # end of line
linebreak = pp.Suppress(pp.Keyword('\') + pp.LineEnd())
identifier = pp.Word(pp.alphanums + '_!<>\')
number = pp.pyparsing_common.number
net = identifier
instref = identifier
instname = identifier
subcktname = identifier
subcktname_end = pp.Keyword("ends").suppress()
comment = pp.Suppress("//" + pp.SkipTo(pp.LineEnd()))
expression = pp.Word(pp.alphanums + '._*+-/()')

input_file = open(netlist.sp,'r')
file_string = input_file.read()
input_file.close()

for t, s, e in parse_netlist().scanString(file_string):
    print(json.dumps(t.asDict()['netlist'], indent=2))

def parse_netlist():
        pp.ParserElement.setDefaultWhitespaceChars(' 	')

        nets = (pp.Optional(pp.Suppress('('))
                + pp.OneOrMore(net('net') | linebreak)
                + pp.Optional(pp.Suppress(')'))
               )

        inst_param_value = expression('expression')

        inst_parameter = pp.Dict(pp.Group(identifier('param_name')
                                  + pp.Suppress("=")
                                  + inst_param_value('param_value')
                                 ))

        parameters = pp.Group(pp.OneOrMore(inst_parameter | linebreak)
                             ).setResultsName('parameters')

        instance = pp.Dict(pp.Group(instname('inst_name')
                            + nets('nets')
                            + instref('reference')
                            + pp.Optional(parameters)
                            + EOL
                           )).setResultsName('instance', listAllMatches=True)

        subckt_core = pp.Group(pp.ZeroOrMore(instance | EOL | comment)
                              ).setResultsName('subckt_core', listAllMatches=True)

        subckt = pp.Group(pp.Keyword("subckt").suppress()
                          + subcktname('subckt_name')
                          + nets('nets')
                          + EOL
                          + subckt_core
                          + subcktname_end
                          + pp.matchPreviousExpr(subcktname).suppress()
                          + EOL
                         ).setResultsName('subcircuit', listAllMatches=True)

        netlist = pp.OneOrMore(subckt
                               | instance
                               | comment('comment')
                               | EOL
                              ).setResultsName('netlist') + pp.StringEnd()

        return netlist

Output of Ex1

[
  {
    "subckt_name": "INVERTER",
    "net": "vdds",
    "nets": [
      "A",
      "Z",
      "gnd",
      "gnds",
      "vdd",
      "vdds"
    ],
    "subckt_core": [
      {
        "instance": [
          {
            "M1": {
              "inst_name": "M1",
              "net": "vdds",
              "nets": [
                "Z",
                "A",
                "vdd",
                "vdds"
              ],
              "reference": "pmos",
              "parameters": {
                "w": "0.4",
                "l": "0.1"
              }
            }
          },
          {
            "M2": {
              "inst_name": "M2",
              "net": "gnds",
              "nets": [
                "Z",
                "A",
                "gnd",
                "gnds"
              ],
              "reference": "nmos",
              "parameters": {
                "w": "0.2",
                "l": "0.1"
              }
            }
          }
        ]
      }
    ]
  },
  {
    "I1": {
      "inst_name": "I1",
      "net": "vdds!",
      "nets": [
        "net1",
        "net2",
        "0",
        "gnds!",
        "vdd!",
        "vdds!"
      ],
      "reference": "INVERTER",
      "parameters": []
    }
  },
  {
    "subckt_name": "INVERTER_2",
    "net": "vdds",
    "nets": [
      "A",
      "Z",
      "gnd",
      "gnds",
      "vdd",
      "vdds"
    ],
    "subckt_core": [
      {
        "instance": [
          {
            "M1": {
              "inst_name": "M1",
              "net": "vdds",
              "nets": [
                "Z",
                "A",
                "vdd",
                "vdds"
              ],
              "reference": "pmos",
              "parameters": {
                "w": "0.4",
                "l": "0.1"
              }
            }
          },
          {
            "M2": {
              "inst_name": "M2",
              "net": "gnds",
              "nets": [
                "Z",
                "A",
                "gnd",
                "gnds"
              ],
              "reference": "nmos",
              "parameters": {
                "w": "0.2",
                "l": "0.1"
              }
            }
          }
        ]
      }
    ]
  },
  {
    "I2": {
      "inst_name": "I2",
      "net": "vdds!",
      "nets": [
        "net\<8\>",
        "net2",
        "0",
        "gnds!",
        "vdd!",
        "vdds!"
      ],
      "reference": "INVERTER_2",
      "parameters": []
    }
  },
  {
    "I3": {
      "inst_name": "I3",
      "net": "vdds!",
      "nets": [
        "net1",
        "net2",
        "0",
        "gnds!",
        "vdd!",
        "vdds!"
      ],
      "reference": "INVERTER_2",
      "parameters": []
    }
  }
]

Output of Ex2

[
  {
    "I2": {
      "inst_name": "I2",
      "net": "vdds!",
      "nets": [
        "INST_IN\<8\>",
        "net2",
        "0",
        "gnds!",
        "vdd!",
        "vdds!"
      ],
      "reference": "INVERTER2",
      "parameters": []
    }
  },
  {
    "I3": {
      "inst_name": "I3",
      "net": "vdds!",
      "nets": [
        "net1",
        "net2",
        "0",
        "gnds!",
        "vdd!",
        "vdds!"
      ],
      "reference": "INVERTER3",
      "parameters": []
    }
  }
]

The grammar:

Formatting Subcircuit Definitions:

subckt SubcircuitName [(] node1 ... nodeN [)]
[ parameters name1=value1 ... [nameN=valueN]]
.
.
.
instance, model, ic, or nodeset statements—or
further subcircuit definitions
.
.
.
ends [SubcircuitName]

Formatting the Instance Statement:

name [(]node1 ... nodeN[)] master [[param1=value1] ...[paramN=valueN]]

PaulMcG · Accepted Answer

Word is one of the most greedy and aggressive of all the repetition types in pyparsing. So your two expressions:

linebreak = pp.Suppress(pp.Keyword('\') + pp.LineEnd())
identifier = pp.Word(pp.alphanums + '_!<>\')

are going to conflict. Once an identifier starts scanning for matching characters, it will not look ahead to the next expression to see if it should stop.

In order to tell the difference between a '\' in an identifier from one that is the continuation, you have a good start with linebreak. Next, we need to remove the '\' from the characters in the identifier word:

identifier = pp.Word(pp.alphanums + '_!<>')

To add back the '\' in identifiers, we'll need to be more specific. Not just any '\' will do, we want only '\' that are not linebreaks (that is, those that are not at the end of the line). We can do that with a negative lookahead. Before accepting a backslash, first make sure it is not a line breaking backslash:

backslash_that_is_not_a_linebreak = ~linebreak + '\'

And now identifier will be the collection of one or more word items, which can be your identifier word as defined above, or a backslash that is not a linebreak.

identifier_word = pp.Word(pp.alphanums + '_!<>')
identifier = pp.OneOrMore(identifier_word | backslash_that_is_not_a_linebreak)

This gets us close, but if you use this identifier to parse "net\<8>", you'll get:

['net', '\', '<8', '\', '>']

If you wrap identifier in a pyparsing Combine, then all should work fine:

identifier = pp.Combine(pp.OneOrMore(identifier_word | backslash_that_is_not_a_linebreak))

print(identifier.parseString(r"net\<8\>"))

gives:

['net\<8\>']

EDIT: In sum, here are the mods needed for this change:

backslash_that_is_not_a_linebreak = ~linebreak + '\'
identifier_word = pp.Word(pp.alphanums + '_!<>')
identifier = pp.Combine(pp.OneOrMore(identifier_word | backslash_that_is_not_a_linebreak))

EDIT2: These lines, declared in your method parse_netlist, need to be at the top of the module, right after importing pyparsing. Otherwise, all of your expressions like linebreak will use the default whitespace characters, including .

ws = ' 	'
pp.ParserElement.setDefaultWhitespaceChars(ws)

Without them, the expression for nets reads past the end of the line in the first line of your subckt and includes the "M2: as another net instead of as the indentifier of the first instance in a subckt_core.

Not sure why your broke up your parser like this, best to keep the bits all together.