Reputation: 71
I am working on a parser using flex/bison that parses linear temporal logic formulas. I am also using the same scanner to parse the input for these formulas.
I am using this regex to scan for identifier names:
[a-zA-Z][a-z \tA-Z0-9_-]*[a-zA-Z0-9]
Identifier names are delimited by commas in the input file, and are surrounded by operators/special characters in the formulas, and the program deals with them fine. Having whitespace in identifier names is not a problem in itself. I wouldn't allow whitespace if I had the choice, but the input file is generated by another program so I cannot change this.
This regex works fine, however it does not allow for single character identifiers, which I would like to have.
Basically, I need a regex that matches single characters, alphanumeric strings with optional whitespace in between characters, and does not match any leading or trailing whitespace.
Hopefully my phrasing is clear enough for you to understand.
Thanks!
Upvotes: 0
Views: 206
Reputation: 626738
Use
[a-zA-Z]([a-z \tA-Z0-9_-]*[a-zA-Z0-9])?
The (...)?
will make the [a-z \tA-Z0-9_-]*[a-zA-Z0-9]
part optional due to the ?
greedy quantifier that matches 1 or 0 occurrences of the preceding subpattern.
See regex demo
Upvotes: 0
Reputation: 5657
[^,]+
It matches any non-comma character, 1 or more times. You can always trim the whitespace off.
(From regexper.com)
Upvotes: 1