Aaron L
Aaron L

Reputation: 96

Scanning a language with non-delimited strings with nested tokens

I want to create a lexer/parser for a language that has non-delimited strings.
Which part of the language is a string is defined by the command preceding it.

For example it has statements that look like this:

pause 5
alert Hello world[CRLF] this contains 'pause' once (1) 

Alert in this instance can end with any string, including keywords and numbers. Further complicating things, the text can contain tags like [CRLF] that I want to separate too. Ideally I'd want this to be broken up into:

[PAUSE][INT 5]
[ALERT][STR "Hello world"][CRLF][STR " this contains 'pause' once (1)"]

I'm currently using flex but from what I've gathered this kind of thing isn't possible with flex.
How can I achieve what I want here?

Upvotes: 0

Views: 57

Answers (1)

Michael Dyck
Michael Dyck

Reputation: 2413

(Since one of your tags is "regex", I'll suggest a non-flex approach.)

From the example, it seems like you could just:

  1. match each line against ^(\w+) (.+) to obtain command and arguments-text, and then
  2. get individual arguments by splitting the arguments-text on (\[\w+\]) (assuming your regex library's split function can return both the splitter-strings and the split-strings).

It's possible your actual situation is more complex and something like flex makes more sense, but I'm not really seeing it so far.

Upvotes: 1

Related Questions