psygo
psygo

Reputation: 7643

SGF Grammar Parser with Peggy

Ideally, I would like to parse SGF's complete grammar. However, at this point, I'm stuck at trying to handle the recursive part only. Here's my feeble attempt at it so far:

import { generate } from "peggy"

const grammar = /* peggy */ `
  string = .*

  parensList = '(' string parensList ')'
             / string
`

const sgf1 =
  "(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd];B[pq];W[dp])"

const parser = generate(grammar)

const parse = parser.parse(sgf1)

console.log(parse)
// [
//   '(', ';', 'G', 'M', '[', '1', ']', 'F', 'F', '[', '4',
//   ']', 'C', 'A', '[', 'U', 'T', 'F', '-', '8', ']', 'A',
//   'P', '[', 'S', 'a', 'b', 'a', 'k', 'i', ':', '0', '.',
//   '5', '2', '.', '2', ']', 'K', 'M', '[', '6', '.', '5',
//   ']', 'S', 'Z', '[', '1', '9', ']', 'D', 'T', '[', '2',
//   '0', '2', '3', '-', '1', '2', '-', '2', '5', ']', ';',
//   'B', '[', 'p', 'd', ']', ';', 'W', '[', 'd', 'd', ']',
//   ';', 'B', '[', 'p', 'q', ']', ';', 'W', '[', 'd', 'p',
//   ']', ')'
// ]

Peggy is the successor of Peg.js.

I think I'm failing to identify how to make this recursive properly. How do I make it identify a ( and get into another level with parensList? (I think I need to define string without ( and ) as well...)

What I'm expecting as a result is some sort of tree or JSON like this:

<Branch>{
  moves: [
    <Move>{
      [property]: <Array<String>>
    },
    ...
  ],
  children: <Array<Branch>>
}

But this would be fine as well:

<NodeObject>{
  data: {
    [property]: <Array<String>>
  },
  children: <Array<NodeObject>>
}

SGF is basically a text-based tree format for saving Go (board game) records. Here's an example — SGF doesn't support comments, and it's usually a one-liner, the code below is just to make it easier to read and understand —:

(
  ;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25] // Game Metadata
  ;B[pd] // Black's Move (`pd` = coordinates on the board)
  ;W[dd] // White's Move
    ( // Parentheses denote a branch in the tree
      ;B[pq]
      ;W[dp]
    )
    (
      ;B[dp]
      ;W[pp]
    )
)

You could also have more than one tree at the top, which would yield something like (tree)(tree)...:

(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd];B[pq];W[dp])(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd](;B[pq];W[dp])(;B[dp];W[pp]))

The whole grammar is this:

Collection     = { GameTree }
GameTree       = "(" RootNode NodeSequence { Tail } ")"
Tail           = "(" NodeSequence { Tail } ")"
NodeSequence   = { Node }
RootNode       = Node
Node           = ";" { Property }
Property       = PropIdent PropValue { PropValue }
PropIdent      = UcLetter { UcLetter }
PropValue      = "[" Value "]"
UcLetter       = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

You can use the editor Sabaki to create SGF files.

Upvotes: 2

Views: 339

Answers (2)

Arjun S
Arjun S

Reputation: 9

The problem with your current grammar is that it doesn't correctly handle the recursive nature of the SGF format. Here's a version of the SGF grammar that should parse the example you provided:

Collection = GameTree+
GameTree = "(" Sequence (GameTree)* ")"
Sequence = Node+
Node = ";" Property+
Property = PropIdent PropValue+
PropIdent = [A-Z]+
PropValue = "[" text "]"
text = [^\\]]*

Here's how it goes:

const grammar = `
  Collection = GameTree+
  GameTree = "(" Sequence (GameTree)* ")"
  Sequence = Node+
  Node = ";" Property+
  Property = PropIdent PropValue+
  PropIdent = [A-Z]+
  PropValue = "[" text "]"
  text = [^\\]]*
`

// const input = "(AA(BB(CC))(B2))"
const sgf2 = "(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd](;B[pq];W[dp])(;B[dp];W[pp]))"

const parser = peggy.generate(grammar)

// const astInput = parser.parse(input)
const astSgf2 = parser.parse(sgf2)

// console.log(astInput)
console.log(astSgf2)
<script src="https://unpkg.com/peggy"></script>

This grammar is designed to parse an SGF file into a tree of nodes, where each node represents a move with properties. The Collection rule matches one or more GameTrees. A GameTree is a sequence of nodes (moves), possibly followed by more game trees (branches). Each Node is a semicolon followed by one or more Propertys. A Property is an identifier followed by one or more values. The PropIdent rule matches one or more uppercase letters, and the PropValue rule matches a value enclosed in square brackets.

Note: This grammar doesn't cover all details of the SGF format, but it provides a solid foundation that can be extended based on specific requirements.

Upvotes: 0

gog
gog

Reputation: 11347

Here's something to get you started:

const grammar = `
  Parens = 
      '(' content:String  rest:Parens* ')' { return {content, rest} }
      / String

  String = [^()]+ { return text() }
`

const input = "(AA(BB(CC))(B2))"
const sgf2 = "(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2023-12-25];B[pd];W[dd](;B[pq];W[dp])(;B[dp];W[pp]))"

const parser = peggy.generate(grammar)

const astInput = parser.parse(input)
const astSgf2 = parser.parse(sgf2)

console.log(astInput)
console.log(astSgf2)
<script src="https://unpkg.com/peggy"></script>

The problem with your grammar seems to be that string is too greedy and consumes the closing ) as well, causing the parens rule to fail.

Upvotes: 1

Related Questions