merkman
merkman

Reputation: 71

Regex help needed for matching identifiers with spaces

I am working on a parser using flex/bison that parses linear temporal logic formulas. I am also using the same scanner to parse the input for these formulas.

I am using this regex to scan for identifier names:

[a-zA-Z][a-z \tA-Z0-9_-]*[a-zA-Z0-9]

Identifier names are delimited by commas in the input file, and are surrounded by operators/special characters in the formulas, and the program deals with them fine. Having whitespace in identifier names is not a problem in itself. I wouldn't allow whitespace if I had the choice, but the input file is generated by another program so I cannot change this.

This regex works fine, however it does not allow for single character identifiers, which I would like to have.

Basically, I need a regex that matches single characters, alphanumeric strings with optional whitespace in between characters, and does not match any leading or trailing whitespace.

Hopefully my phrasing is clear enough for you to understand.

Thanks!

Upvotes: 0

Views: 206

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

Use

[a-zA-Z]([a-z \tA-Z0-9_-]*[a-zA-Z0-9])?

The (...)? will make the [a-z \tA-Z0-9_-]*[a-zA-Z0-9] part optional due to the ? greedy quantifier that matches 1 or 0 occurrences of the preceding subpattern.

See regex demo

Upvotes: 0

Ben Aubin
Ben Aubin

Reputation: 5657

Try

[^,]+

It matches any non-comma character, 1 or more times. You can always trim the whitespace off.

Explanation:

(From regexper.com)

Regexper Image

Upvotes: 1

Related Questions