Turn raw gedcom-format string into structured json

Question

I'm working on a Gedcom parser. The raw data for it looks like this:

0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR GS
2 NAME GEDCOM Specification
2 VERS 5.5.5
2 CORP gedcom.org
3 ADDR
4 CITY LEIDEN
3 WWW www.gedcom.org
1 DATE 2 Oct 2019
2 TIME 0:00:00
1 FILE 555Sample.ged
1 LANG English
1 SUBM @U1@

The format is build on a system of indentation - here in the form of numbers at the beginning of each line. 0 is the parent, 1 is it's child named 'GEDC' and 'GEDC' also has a child named 'VERS' and so on. I could hard code something to be able to put all 2s inside a json property with the name of the previous 1 etc. But that wouldn't be very efficient and I wouldn't be able to deal with deeper levels such as:

"SOUR": {
        "GS": {
          "NAME": "GEDCOM Specification",
          "VERS": "5.5.5",
          "CORP": {
            "gedcom.org": {
              "ADDR": {
                "CITY": "LEIDEN"
              },
              "WWW": "www.gedcom.org"
            }
          }
        }
        },

How would you approach this problem and what resources could help leading me in the right direction?

Turn raw gedcom-format string into structured json

Answers (1)

Related Questions