Reputation: 1937
I'm looking to write a regex (C#) that will match words that aren't surrounded by quotes. An example input string would be:
dbo.test line_length "quoted words" notquoted
And this needs to match
dbo.test
line_length
nonquoted
So 3 separate matches and "quoted words" is not matched. The quoted phrase could be anywhere in the input...beginning, middle, end, etc.
I haven't been able to come up with a regex that matches words not in quotes where there could be a space in the quotes...I've been able to match something like: hello "world" and only get hello.
Is there a way to write the regex I'm trying to?
Upvotes: 0
Views: 212
Reputation: 4069
There are two ways to tackle this, depending on what you want to do with the output.
First, match (but don't capture) any text within quotation marks. (This is specifically matching the stuff that you DON'T want.)
Using the |
pipe, use capture groups to select everything that you DO want to keep.
Example:
".*?"|(\b\S+\b)
You can see an example of that here.
The other option, using look-arounds, is to specifically look backward from the beginning of the words to ensure that the "
doesn't appear there:
(?<!")(\b\S+\b)(?!")
You can see that here.
This may have a problem when you start using multiple words, but this should get you on the right track, and you can indicate whether one of these methods works better for you than the other.
Upvotes: 0