Reputation: 3123
I would like to patch some text data extracted from web pages. sample:
t="First sentence. Second sentence.Third sentence."
There is no space after the point at the end of the second sentence. This sign me that the 3rd sentence was in a separate line (after a br tag) in the original document.
I want to use this regexp to insert "\n" character into the proper places and patch my text. My regex:
t2=t.gsub(/([.\!?])([A-Z1-9])/,$1+"\n"+$2)
But unfortunately it doesn't work: "NoMethodError: undefined method `+' for nil:NilClass" How can I properly backreference to the matched groups? It was so easy in Microsoft Word, I just had to use \1 and \2 symbols.
Upvotes: 31
Views: 20116
Reputation: 31726
You can backreference in the substitution string with \1
(to match capture group 1). However, the literal backslash has to escaped as \\
when using a double-quote string literal:
t = "First sentence. Second sentence.Third sentence!Fourth sentence?Fifth sentence."
t.gsub(/([.!?])([A-Z1-9])/, "\\1\n\\2")
#=> "First sentence. Second sentence.\nThird sentence!\nFourth sentence?\nFifth sentence."
Upvotes: 36
Reputation: 547
If you got here because of Rubocop complaining "Avoid the use of Perl-style backrefs." about $1, $2, etc... you can can do this instead:
some_id = $1
# or
some_id = Regexp.last_match[1] if Regexp.last_match
some_id = $5
# or
some_id = Regexp.last_match[5] if Regexp.last_match
It'll also want you to do
%r{//}.match(some_string)
instead of
some_string[//]
Lame (Rubocop)
Upvotes: 9
Reputation: 168071
gsub(regex, replacement)
, then use '\1'
, '\2'
, ... to refer to the match. Make sure not to put double quotes around the replacement
, or else escape the backslash as in Joshua's answer. The conversion from '\1'
to the match will be done within gsub
, not by literal interpretation.gsub(regex){replacement}
, then use $1
, $1
, ...But for your case, it is easier not to use matches:
t2 = t.gsub(/(?<=[.\!?])(?=[A-Z1-9])/, "\n")
Upvotes: 27