Reputation: 8346
I have a string like this:
a b c a b " a b " b a " a "
How do I match every a
that is not part of a string delimited by "
? I want to match everything that is bold here:
a bc a b " ab " b a " a "
I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.
Upvotes: 19
Views: 13477
Reputation: 41838
js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)
As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a
subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced
See this live demo
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
Upvotes: 10
Reputation: 336158
Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:
result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')
This replaces all the a
s with the empty string if and only if there is an even number of quotes ahead of the matched a
.
Explanation:
a # Match a
(?= # only if it's followed by...
(?: # ...the following:
[^"]*" # any number of non-quotes, followed by one quote
[^"]*" # the same again, ensuring an even number
)* # any number of times (0, 2, 4 etc. quotes)
[^"]* # followed by only non-quotes until
\Z # the end of the string.
) # End of lookahead assertion
If you can have escaped quotes within quotes (a "length: 2\""
), it's still possible but will be more complicated:
result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')
This is in essence the same regex as above, only substituting (?:\\.|[^"\\])
for [^"]
:
(?: # Match either...
\\. # an escaped character
| # or
[^"\\] # any character except backslash or quote
) # End of alternation
Upvotes: 31
Reputation: 56809
Full-blown regex solution for regex lover, without caring about performance or code-readability.
This solution assumes that there is no escaping syntax (with escaping syntax, the a
in "sbd\"a"
is counted as inside the string).
Pseudocode:
processedString =
inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
.replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote
Then you can match the text you want in the processedString
. You can remove the 2nd replace if you consider text after the lone quote as outside quote.
EDIT
In Ruby, the regex in the code above would be
/\".*?\"/
used with gsub
and
/\".*/
used with sub
To address the replacement problem, I'm not sure whether this is possible, but it worths trying:
/(\"|a)/
with gsub, and supply function."
, then increment counter, and return "
as replacement (basically, no change). If match is a
check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.Upvotes: 0