Replace occurrences within capture groups using Regex

Question

tl;dr: How do I replace only specific characters (i.e. line breaks) in a regex match in Ruby?

I have an array of strings. Each element of the array has between 2 and 4 words (= any sequence of characters) divided by spaces in a specific sequence.

I also have a large string in which I want to check for instances of those word sequences which are broken by instead of space. For example, I want to match an element of the array:

arr[0] = "aaa bbbb ccccc"

to a string that looks like this:

zzzzzzzzz aaa

bbbb ccccc yyyyyyyyy

And make it look like this:

zzzzzzzzz aaa bbbb ccccc yyyyyyyyy

The thing is, I can think of at least two ways of doing it, but they seem very cumbersome. What I would do is:

replace each space in the array with [ ]
generate a regex with Regexp.union comprising all elements of the array
use the regex to match instances of my arr elements in the string
generate a .gsub! for each string so that it does not replace the entire match, but only elements of the match (or use multiple capture groups)

I suspect, however, that this is a rather silly way to do it. Is there a way to do it in Ruby that is less "around"?

EDIT: How to implement the answer below with regexp.union? I have a function that generates the regex:

def generateMergeRx(arr_with_keywords)
    arr_with_keywords.delete_if{|x| (x.include? " ") == false}
    matchRegexMerge = Regexp.new("(%{keywordReplace})" % {
        keywordReplace: Regexp.union(arr_with_keywords).source
    })
end

This is what it looks like using puts regexMerge.to_s:

(?-mix:(And\.\ z\ Kobyl\.|Ban\.\ W\.|B\.\ B\.|B\.\ G\.|Biel\.\ J\.)

It corresponds to that:

And. z Kobyl.
Ban. W.
B. B.
B. G.
Biel. J.
(...)

And then I call it like that:

regexMerge = generateMergeRx arr_with_keywords
some_string.gsub!(regexMerge.to_s.gsub!(/ /, "\s"), "\1")

But what should I put instead of \1? Because at the moment input = output.

Aleksei Matiushkin · Accepted Answer

▶ str = 'zzzzzzzzz aaa
▷ bbbb ccccc yyyyyyyyy'
▶ re = "aaa bbbb ccccc"
▶ str.gsub /#{re.gsub(/ +/, '\s+')}/, re
#⇒ "zzzzzzzzz aaa bbbb ccccc yyyyyyyyy"

The general idea is to match any spaces, including and to replace them with original string.

Replace occurrences within capture groups using Regex

Answers (1)

Related Questions