js-coder
js-coder

Reputation: 8346

How to match something with regex that is not between two special characters?

I have a string like this:

a b c a b " a b " b a " a "

How do I match every a that is not part of a string delimited by "? I want to match everything that is bold here:

a bc a b " ab " b a " a "

I want to replace those matches (or rather remove them by replacing them with an empty string), so removing the quoted parts for matching won't work, because I want those to remain in the string. I'm using Ruby.

Upvotes: 19

Views: 13477

Answers (3)

zx81
zx81

Reputation: 41838

js-coder, resurrecting this ancient question because it had a simple solution that wasn't mentioned. (Found your question while doing some research for a regex bounty quest.)

As you can see the regex is really tiny compared with the regex in the accepted answer: ("[^"]*")|a

subject = 'a b c a b " a b " b a " a "'
regex = /("[^"]*")|a/
replaced = subject.gsub(regex) {|m|$1}
puts replaced

See this live demo

Reference

How to match pattern except in situations s1, s2, s3

How to match a pattern unless...

Upvotes: 10

Tim Pietzcker
Tim Pietzcker

Reputation: 336158

Assuming the quotes are correctly balanced and there are no escaped quotes, then it's easy:

result = subject.gsub(/a(?=(?:[^"]*"[^"]*")*[^"]*\Z)/, '')

This replaces all the as with the empty string if and only if there is an even number of quotes ahead of the matched a.

Explanation:

a        # Match a
(?=      # only if it's followed by...
 (?:     # ...the following:
  [^"]*" #  any number of non-quotes, followed by one quote
  [^"]*" #  the same again, ensuring an even number
 )*      # any number of times (0, 2, 4 etc. quotes)
 [^"]*   # followed by only non-quotes until
 \Z      # the end of the string.
)        # End of lookahead assertion

If you can have escaped quotes within quotes (a "length: 2\""), it's still possible but will be more complicated:

result = subject.gsub(/a(?=(?:(?:\\.|[^"\\])*"(?:\\.|[^"\\])*")*(?:\\.|[^"\\])*\Z)/, '')

This is in essence the same regex as above, only substituting (?:\\.|[^"\\]) for [^"]:

(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^"\\] # any character except backslash or quote
)       # End of alternation

Upvotes: 31

nhahtdh
nhahtdh

Reputation: 56809

Full-blown regex solution for regex lover, without caring about performance or code-readability.

This solution assumes that there is no escaping syntax (with escaping syntax, the a in "sbd\"a" is counted as inside the string).

Pseudocode:

processedString = 
    inputString.replaceAll("\\".*?\\"","") // Remove all quoted strings
               .replaceFirst("\\".*", "") // Consider text after lonely quote as inside quote

Then you can match the text you want in the processedString. You can remove the 2nd replace if you consider text after the lone quote as outside quote.

EDIT

In Ruby, the regex in the code above would be

/\".*?\"/

used with gsub

and

/\".*/

used with sub


To address the replacement problem, I'm not sure whether this is possible, but it worths trying:

  • Declare a counter
  • Use the regex /(\"|a)/ with gsub, and supply function.
  • In the function, if match is ", then increment counter, and return " as replacement (basically, no change). If match is a check whether the counter is even: if even supply your replacement string; otherwise, just supply whatever is matched.

Upvotes: 0

Related Questions