How to use FParsec to parse identifiers with different start and end characters

Question

I'm having difficulty working out the best way to parse identifiers that have different characters at the start and end. For example, let's say that the start characters of our identifiers may be upper and lowercase only, while the middle of an identifier may also include digits and colons. The end of an identifier may not be a colon, but may be an apostrophe.

So the following are all legal identifiers:

f, f0, f:', f000:sdfsd:asdf

But the following are not:

0, hello:, he'llo

I can't see how best to handle the backtracking: a colon is fine in the middle, but we need some lookahead to determine whether we are at the end of the identifier.

EDIT:

Thanks for the suggestions. Using a regex is a pragmatic approach, but I find it slightly disappointing that there doesn't seem to be clean/obvious way of doing this otherwise.

Kcvin · Accepted Answer

I also think you should use regex, however I came up with a different pattern:

let pattern = regex @"^([a-zA-Z]+[a-zA-Z0-9:]*[a-zA-Z']?)$"

which will hold all of your wanted Matches in the first group. You can use an online RegExp tool to validate your matches/grouping.

How to use FParsec to parse identifiers with different start and end characters

Answers (2)

Related Questions