Reputation: 2460
I have a big text file. Within this text file, I want to replace all mentions of the word 'pizza' with 'spinach', 'Pizza' with 'Spinach', and 'pizzing' with 'spinning' -- unless those words occur anywhere within curly braces. So {pizza}
, {giant.pizza}
and {hot-pizza-oven}
should remain unchanged.
My best proposed solution so far is to iterate over the file line-by-line, issuing a regex that detects everything before an { or after an }, and using regexes on each of those strings. But that gets really complex and unwieldy and I want to know if there's a proper solution for this problem.
Upvotes: 0
Views: 71
Reputation: 2381
rules = {'pizza' => 'spinach','Pizza' => 'Spinach','pizzing' => 'spinning'}
regexp = /\{[^{}]*\}|#{rules.keys.join('|')}/m
puts(file.read.gsub(regexp) { |s| rules[s] || s })
This constructs a regular expression that matches either bracketed strings or the strings to replace. We then run it through a block that replaces strings with the given value, and will leave bracketed strings unchanged. With the /m
flag, the regular expression can tolerate newlines inside the brackets--if that won't happen, you can take it out. Either way, no need to iterate line by line.
Upvotes: 1
Reputation: 110725
I would call the following method for each line of the file.
Code
def doit(line)
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
arr= line.split(r).map { |str|
str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
line.scan(r).each_with_object(arr.shift) { |str,res|
res << str << arr.shift }
end
Examples
doit("Pizza Primastrada's {pizza} is the best {pizzing} pizza in town.")
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
doit("{Pizza Primastrada}'s pizza is the best pizzing {pizza} in town.")
#=> "{Pizza Primastrada}'s spinach is the best spinning {pizza} in town."
Explanation
line = "Pizza Primastrada's {pizza} is the best {pizzing} pizza in town."
replace = {'pizza'=>'spinach', 'Pizza'=>'Spinach', 'pizzing'=>'spinning'}
r = /\{.*?\}/
a = line.split(r)
#=> ["Pizza Primastrada's ", " is the best ", " pizza in town."]
b = a.map { |str| str.gsub(/\b(?:pizza|Pizza|pizzing)\b/, replace) }
#=> ["Spinach Primastrada's ", " is the best ", " spinach in town."]
keepers = line.scan(r)
#=> ["{pizza}", "{pizzing}"]
keepers.each_with_object(b.shift) { |str,res| res << str << b.shift }
#=> "Spinach Primastrada's {pizza} is the best {pizzing} spinach in town."
Nested braces
If you wish to permit nested braces, change the regex to:
r = /\{[^{}]*?(?:\{.*?\})*?[^{}]*?\}/
doit("Pizza Primastrada's {{great {great} pizza} is the best pizza.")
#=> "Spinach Primastrada's {{great {great} pizza} is the best spinach."
You referred to the string
{words,salad,#{1,2,3} pizza|}
in a comment. If that is part of a string enclosed in single quotes, not a problem. If enclosed in double quotes, however, #
will raise a syntax error. Again, no problem, if the pound character is escaped (\#
).
Upvotes: 0
Reputation: 80075
str = "Pizza {pizza} with spinach is not pizzing."
swaps = {'{pizza}' =>'{pizza}',
'{Pizza}' =>'{Pizza}',
'{pizzing}'=> '{pizzing}'
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'}
regex = Regexp.union(swaps.keys)
p str.gsub(regex, swaps) # => "Spinach {pizza} with spinach is not spinning."
Upvotes: 0
Reputation: 23317
This can be done in a few steps. I'd iterate through the file line by line, and pass each line to this method:
def spinachize line
# list of words to swap
swaps = {
'pizza' => 'spinach',
'Pizza' => 'Spinach',
'pizzing' => 'spinning'
}
# random placeholder for bracketed text
placeholder = 'fdjfafdlskdsfajkldfas'
# save all instances of bracketed text
bracketed_text = line.scan(/\{.*?\}/)
# remove bracketed text from line
line.gsub!(/\{.*?\}/, placeholder)
# replace all swaps
swaps.each do |original_text, new_text|
line.gsub!(original_text, new_text)
end
# re-insert bracketed text
line.gsub(placeholder){bracketed_text.shift}
end
The comments above explain things as we go. Here are a couple of examples:
spinachize "Pizza is good, but more pizza is better"
=> "Spinach is good, but more spinach is better"
spinachize "Leave bracketed instances of {pizza} or {this.pizza} alone"
=> "Leave bracketed instances of {pizza} or {this.pizza} alone"
As you can see, you can specify the items you want swapped, or modify the method to pull the list from a database or flat file somewhere. The placeholder just needs to be something unique that wouldn't come up in the source file naturally.
The process is this: remove bracketed text from the original line, and remember it for later. Swap all text that needs swapping, then add back the bracketed text. It's not a one-liner, but it works well and is readable and easy to update.
The last line of the method might need some clarification. Not many people know that the "gsub" method can take a block instead of a second parameter. That block then determines what gets put in place of the original text. In this case, every time the block is called I remove the first item off our saved bracket list, and use that.
Upvotes: 2