Reputation: 103
I am trying to replace commas within all sets of parentheses with a semicolon, but not change any commas outside of the parentheses.
So, for example:
"a, b, c (1, 2, 3), d, e (4, 5)"
should become:
"a, b, c (1; 2; 3), d, e (4; 5)"
I have started attempting this with gsub, but I am having a really hard time understanding/figuring out what how to identify those commas within the parentheses.
I would call myself an advanced beginner with R, but with regular expressions and text manipulations, a total noob. Any help you can provide would be great.
Upvotes: 5
Views: 815
Reputation: 626689
The simplest solution
A most common workaround that will work in case all parentheses are balanced:
,(?=[^()]*\))
See the regex demo. R code:
a <- "a, b, c (1, 2, 3), d, e (4, 5)"
gsub(",(?=[^()]*\\))", ";", a, perl=T)
## [1] "a, b, c (1; 2; 3), d, e (4; 5)"
See IDEONE demo
The regex matches...
,
- a comma if...(?=[^()]*\))
- it is followed by 0 or more characters other than (
or )
(with [^()]*
) and a literal )
.Alternative solutions
If you need to make sure only commas inside the closest open and close parentheses are replaced, it is safer to use a gsubfn
based approach:
library(gsubfn)
x <- 'a, b, c (1, 2, 3), d, e (4, 5)'
gsubfn('\\(([^()]*)\\)', function(match) gsub(',', ';', match, fixed=TRUE), x, backref=0)
## => [1] "a, b, c (1; 2; 3), d, e (4; 5)"
Here, \(([^()]*)\)
matches (
, then 0+ chars other than (
and )
and then )
, and after that the match
found is passed to the anonymous function where all ,
chars are replaced with semi-colons using gsub
.
If you need to perform this replacement inside balanced parentheses with unknown level depth use a PCRE regex with gsubfn
:
x1 <- 'a, b, c (1, (2, (3, 4)), 5), d, e (4, 5)'
gsubfn('\\(((?:[^()]++|(?R))*)\\)', function(match) gsub(',', ';', match, fixed=TRUE), x1, backref=0, perl=TRUE)
## => [1] "a, b, c (1; (2; (3; 4)); 5), d, e (4; 5)"
Pattern details
\( # Open parenthesis
( # Start group 1
(?: # Start of a non-capturing group:
[^()]++ # Any 1 or more chars other than '(' and ')'
| # OR
(?R) # Recursively match the entire pattern
)* # End of the non-capturing group and repeat it zero or more times
) # End of Group 1 (its value will be passed to the `gsub` via `match`)
\) # A literal ')'
Upvotes: 6