rneatherway
rneatherway

Reputation: 553

How to use FParsec to parse identifiers with different start and end characters

I'm having difficulty working out the best way to parse identifiers that have different characters at the start and end. For example, let's say that the start characters of our identifiers may be upper and lowercase only, while the middle of an identifier may also include digits and colons. The end of an identifier may not be a colon, but may be an apostrophe.

So the following are all legal identifiers:

f, f0, f:', f000:sdfsd:asdf

But the following are not:

0, hello:, he'llo

I can't see how best to handle the backtracking: a colon is fine in the middle, but we need some lookahead to determine whether we are at the end of the identifier.

EDIT:

Thanks for the suggestions. Using a regex is a pragmatic approach, but I find it slightly disappointing that there doesn't seem to be clean/obvious way of doing this otherwise.

Upvotes: 0

Views: 268

Answers (2)

Kcvin
Kcvin

Reputation: 5163

I also think you should use regex, however I came up with a different pattern:

let pattern = regex @"^([a-zA-Z]+[a-zA-Z0-9:]*[a-zA-Z']?)$"

which will hold all of your wanted Matches in the first group. You can use an online RegExp tool to validate your matches/grouping.

Upvotes: 1

mlambert
mlambert

Reputation: 81

You can handle this with a regex parser

let ident = regex @"[A-Za-z][A-Za-z0-9\:]*[A-Za-z0-9\']"

http://www.quanttec.com/fparsec/reference/charparsers.html

Upvotes: 1

Related Questions