captcoma
captcoma

Reputation: 1898

why does gsubfn omit part of the match?

I analyse text strings and I try replace all dots . within round brackets () with commas ,

I found a regex that matches eveything within the brackets:

text <- "let's count (get . this . without dots) the days?"
brackets = "\\((.*?)\\)"
regmatches(text,regexpr(brackets,text))

gives me:

[1] "(get . this . without dots)"

As described here, I could use gsubfn to do the changes:

library(gsubfn)
gsubfn(brackets, ~ gsub("\\.", ",",x), text)

gives me:

[1] "let's count get , this , without dots the days?"

instead of what I thought I would get:

[1] "let's count (get , this , without dots) the days?"

Why does gsubfn omit a part of my match? (i.e. the brackets) Is there any other way the replace the . within () with ,

Upvotes: 2

Views: 72

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389175

We can solve this using base R gsub with positive-look ahead.

gsub("\\.(?=[^()]*\\))", ",", text, perl = TRUE)
#[1] "let's count (get , this , without dots) the days?"

This matches a dot (.) only if the first round brackets to it's right is closing one. [^()] matches everything which is not round brackets. The dot is then replaced with comma.

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You may keep as many capturing groups as you need in the original regex, no need to modify the pattern, just tell gsubfn to use the whole match by passing backref=0 argument:

gsubfn("\\((.*?)\\)", ~ gsub("\\.", ",",x), text, backref=0)
[1] "let's count (get , this , without dots) the days?"

Upvotes: 3

Onyambu
Onyambu

Reputation: 79298

What you have done here is not to include the paranthesis/brackets within your match. ie You did not capture the brackets. Try

text <- "let's count (get . this . without dots) the days?"
brackets = "(\\(.*?\\))" # NOTE THAT I CAPTURED THE PARANTHESIS TOO
regmatches(text,regexpr(brackets,text))
[1] "(get . this . without dots)"


library(gsubfn)
gsubfn(brackets, ~ gsub("\\.", ",",x), text)
[1] "let's count (get , this , without dots) the days?"

Upvotes: 3

Related Questions