Reputation: 12087
Given an example input like below:
s = "an example with 'one' word and 'two and three' words inside quotes"
I'm trying to iterate over parts outside of quotes to do some substitutions. For example to convert and
to &
but only outside of quotes to get:
an example with 'one' word & 'two and three' words inside quotes
If I were to change inside of quotes, I could simply do the following:
s.gsub(/'.*?'/){ |q| q.gsub(/and/, '&') }
to get:
an example with 'one' word and 'two & three' words inside quotes
I mainly tried two things to adapt this strategy to outside of quotes.
First, I tried to negate the regexp inside first gsub
(i.e. /'.*?'/
). I imagine if there were a suffix modifier like /v
I could simply do s.gsub(/'.*?'/v){ ... }
, unfortunately I couldn't find anything like this. There is a negative lookahead (i.e. (?!pat)
) but I don't think it is what I need.
Second, I tried to use split
with gsub!
as such:
puts s.split(/'.*?'/){ |r| r.gsub!(/and/, '&') }
Using split
I can iterate over the parts outside of quotes:
s.split(/'.*?'/){ |r| puts r }
to get:
an example with
word and
words inside quotes
However, I can't mutate these parts inside the block with gsub
or gsub!
. I guess I need a mutating version of split
, something akin to gsub
being a mutating version of scan
, but there doesn't seem to be anything like this.
Is there an easy way to make either of these approaches work?
Upvotes: 2
Views: 57
Reputation: 626845
You may match and capture what you need to keep and just match what you need to replace.
Use
s.gsub(/('[^']*')|and/) { $1 || '&' }
s.gsub(/('[^']*')|and/) { |m| m == $~[1] ? $~[1] : '&' }
If you need to match and
as a whole word, use \band\b
in the pattern instead of and
.
This approach is very convenient since you may add as many specific patterns you want to skip as you want. E.g. you want to also avoid matching a whole word and
in between double quotation marks:
s.gsub(/('[^']*'|"[^"]*")|\band\b/) { $1 || '&' }
Or, you want to make sure it is also skipping strings between quotes with escaped quotes:
s.gsub(/('[^'\\]*(?:\\.[^'\\]*)*'|"[^"\\]*(?:\\.[^"\\]*)*")|\band\b/m) { $1 || '&' }
Or, if it appears outside of round, square, angle brackets and braces:
s.gsub(/(<[^<>]*>|\{[^{}]*\}|\([^()]*\)|\[[^\]\[]*\])|\band\b/m) { $1 || '&' }
Match and capture substrings between single quotes and just match what you need to change. If Group 1 matches, put it back with $1
, else, replace with &
. The replacement block in the second line just checks if the Group 1 value of the last match is the same as the currently matched value, and if yes, it puts it back, else, replaces with &
.
See a Ruby demo.
Regex details
('[^']*')
- Capturing group #1: '
, zero or more chars other than '
and then a '
char|
- orand
- and
substring.Upvotes: 1
Reputation: 110685
You can perform the desired substitutions by using the following regular expression.
r = /\G[^'\n]*?(?:'[^'\n]*'[^'\n]*?)*?\K\band\b/
The Ruby code needed is as follows.
str = "an and with 'one' word and 'two and three' words and end"
str.gsub(r, '&')
#=> "an & with 'one' word & 'two and three' words & end"
Ruby's regex engine performs the following operations. Essentially, the regex asserts that "and"
follows an even number of single quotes since the previous match, or an even number of single quotes from the beginning of the string if it is the first match.
\G : asserts position at the end of the previous match
or the start of the string for the first match
[^'\n]*? : match 0+ chars other than ' and \n, lazily
(?: : begin capture group
'[^'\n]*' : match ' then 0+ chars other than ' and \n then '
[^'\n]*? : match 0+ chars other than ' and \n, lazily
) : end non-capture group
*? : execute non-capture group 0+ times, lazily
\K : forget everything matched so far and reset start of match
\band\b/ : match 'and'
Upvotes: 1