Reputation: 131
I'm working on an EDI file parser, and I'm having considerable difficulty implementing an escape for the 'segment terminator'. For anyone fortunate enough to not work with EDI, the segment terminator (usually an apostrophe) is the deliter between segments, which are like cells.
The desired behaviour looks something like this:
ABC+123'DEF+567' -> ["ABC+123", "DEF+567"]
ABC+123?'DEF+567' -> ["ABC+123?'DEF+567"]
Using FParsec, without escaping the apostrophe (and, for simplicity, ignoring parameterisation), the parser looks something like this:
let pSegment = //logic to parse the contents of a segment
let pAllSegments = sepEndBy pSegment (str "'")
This approach with the above example would yield ["ABC+123?", "DEF+567"]
.
My next consideration was to use a regex:
let pAllSegments = sepEndBy pSegment (regex @"[^\?]'")
The problem here is that the character prior to the apostrophe is also consumed, leading to incomplete messages.
I'm fairly certain I just don't understand FParsec well enough here. Does anyone have any pointers?
Upvotes: 0
Views: 91
Reputation: 131
The issue is in the parse contents step.
The parser is working 'bottom up'. It finds the contents of the segments, which are not permitted to contain the terminator, then finds that all these segments are separated by the terminator, and constructs the list.
My error was in the pSegment
step, which was using a parameterised version of (?:[A-Za-z0-9 \\.]|\?[\?\+:\?])*
. See that second ?
? That should have been a '
.
Upvotes: 1