Reputation: 8480
I need to parse a custom file format with C#. The file format is a PBX file of Xcode project. There is no official documentation on the format. But it's rather straightforward. Here is the simple example:
// !$*UTF8*$!
{
archiveVersion = 1;
classes = {
};
objectVersion = 46;
objects = {
/* Begin PBXBuildFile section */
5143B90C1884374800F27FD8 /* Foundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90B1884374800F27FD8 /* Foundation.framework */; };
5143B90E1884374800F27FD8 /* CoreGraphics.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90D1884374800F27FD8 /* CoreGraphics.framework */; };
5143B9101884374800F27FD8 /* UIKit.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 5143B90F1884374800F27FD8 /* UIKit.framework */; };
/* End PBXBuildFile section */
};
rootObject = 5143B9001884374800F27FD8 /* Project object */;
}
In objects section there is a sequence of object definitions: object unique id followed by its properties. You can see comments here. Also property values can be enclosed in quotes.
The complete example of PBX file is here.
Now I need to build DOM of the file. What is the best approach to solve this kind of tasks?
Upvotes: 3
Views: 2107
Reputation: 21072
Using parser (because of nested braces regex is no-go). Pick the one you feel OK with syntax:
I guess you are new to this, so this is why I grouped those -- top down approach, bottom up and combinator one. My personal preference is bottom up, the definition of mathematical expressions feels more natural for me, but here you should not have that kind of problem.
Starting 2014-01-28 NLT includes PBXProj files simple reader.
Upvotes: 1
Reputation: 21712
If you need to be able to match nested braces, regexes will not work. You could use a parser generator like ANTLR, but this format looks simple enough to write your own recursive descent parser.
Before we could show you how to write the parser we would need to know what kind of DOM you want to output.
Upvotes: 0
Reputation: 99889
I use the Regex classes when they are suitable, but for more structured data like you've shown here I would turn to ANTLR as documented here for C#.
Upvotes: 0