Scanning a language with non-delimited strings with nested tokens

Question

I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.

For example it has statements that look like this:

pause 5
alert Hello world[CRLF] this contains 'pause' once (1)

Alert in this instance can end with any string, including keywords and numbers. Further complicating things, the text can contain tags like [CRLF] that I want to separate too. Ideally I'd want this to be broken up into:

[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]

I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?

Scanning a language with non-delimited strings with nested tokens

Answers (1)

Related Questions