Reputation: 3
I'm working on a Gedcom parser. The raw data for it looks like this:
0 HEAD
1 GEDC
2 VERS 5.5.5
2 FORM LINEAGE-LINKED
3 VERS 5.5.5
1 CHAR UTF-8
1 SOUR GS
2 NAME GEDCOM Specification
2 VERS 5.5.5
2 CORP gedcom.org
3 ADDR
4 CITY LEIDEN
3 WWW www.gedcom.org
1 DATE 2 Oct 2019
2 TIME 0:00:00
1 FILE 555Sample.ged
1 LANG English
1 SUBM @U1@
The format is build on a system of indentation - here in the form of numbers at the beginning of each line. 0 is the parent, 1 is it's child named 'GEDC' and 'GEDC' also has a child named 'VERS' and so on. I could hard code something to be able to put all 2s inside a json property with the name of the previous 1 etc. But that wouldn't be very efficient and I wouldn't be able to deal with deeper levels such as:
"SOUR": {
"GS": {
"NAME": "GEDCOM Specification",
"VERS": "5.5.5",
"CORP": {
"gedcom.org": {
"ADDR": {
"CITY": "LEIDEN"
},
"WWW": "www.gedcom.org"
}
}
}
},
How would you approach this problem and what resources could help leading me in the right direction?
Upvotes: 0
Views: 642
Reputation: 20132
There are a number of good GEDCOM parser libraries available. Do a web search for "GEDCOM parser" and you'll come up with quite a few of them that can give you ideas and links to further help.
Upvotes: 0