Reputation: 96
I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.
For example it has statements that look like this:
pause 5
alert Hello world[CRLF] this contains 'pause' once (1)
Alert in this instance can end with any string, including keywords and numbers. Further complicating things, the text can contain tags like [CRLF] that I want to separate too. Ideally I'd want this to be broken up into:
[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]
I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?
Upvotes: 0
Views: 57
Reputation: 2413
(Since one of your tags is "regex", I'll suggest a non-flex approach.)
From the example, it seems like you could just:
^(\w+) (.+)
to obtain command and arguments-text, and then(\[\w+\])
(assuming your regex library's split function can return both the splitter-strings and the split-strings).It's possible your actual situation is more complex and something like flex makes more sense, but I'm not really seeing it so far.
Upvotes: 1