Reputation: 401
Consider a = paste(1:10,collapse=", ")
which results in
a = "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
I would like to replace every n-th (say 4-th) occurrences of "," and replace it with something else (say "\n"). The desired output would be:
"1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
I am looking for a code that uses gsub
(or something equivalent) and some form of regular expression
to achieve this goal.
Upvotes: 8
Views: 8033
Reputation: 499
This one can be replace with a string instead of a character. I did a function that you can use easily :)
A demo here to understand the regex
> a = paste(1:10,collapse=", ")
> a
[1] "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
> # if you want the 2nd occurence
> gsub("(.*?,.*?),(.*)", "\\1\n\\2", a)
[1] "1, 2\n 3, 4, 5, 6, 7, 8, 9, 10"
> # if you want the 3rd occurence
> gsub("(.*?,.*?,.*?),(.*)", "\\1\n\\2", a)
[1] "1, 2, 3\n 4, 5, 6, 7, 8, 9, 10"
> # if you want the 4rd occurence
> gsub("(.*?,.*?,.*?,.*?),(.*)", "\\1\n\\2", a)
[1] "1, 2, 3, 4\n 5, 6, 7, 8, 9, 10"
> # if you want the last occurence
> gsub("(.*,.*),(.*)", "\\1\n\\2", a)
[1] "1, 2, 3, 4, 5, 6, 7, 8, 9\n 10"
>
>
> replace.occurence <- function(x, pattern, replacement, which.occu) {
+ if( which.occu == "last" ) {
+ gsub(paste0("(.*", pattern, ".*)", pattern, "(.*)"), paste0("\\1", replacement, "\\2"), x)
+ } else {
+ gsub(paste0("(.*?", paste0(rep(paste0(pattern, ".*?"), which.occu - 1), collapse = ""), ")", pattern, "(.*)"), paste0("\\1", replacement, "\\2"), x)
+ }
+ }
>
> replace.occurence(a, pattern = ",", replacement = "\n", which.occu = 2)
[1] "1, 2\n 3, 4, 5, 6, 7, 8, 9, 10"
> replace.occurence(a, pattern = ",", replacement = "\n", which.occu = 3)
[1] "1, 2, 3\n 4, 5, 6, 7, 8, 9, 10"
> replace.occurence(a, pattern = ",", replacement = "\n", which.occu = 4)
[1] "1, 2, 3, 4\n 5, 6, 7, 8, 9, 10"
> replace.occurence(a, pattern = ",", replacement = "\n", which.occu = "last")
[1] "1, 2, 3, 4, 5, 6, 7, 8, 9\n 10"
>
> replace.occurence(a, pattern = ", 3, 4,", replacement = ", 4, 3,", which.occu = 1)
[1] "1, 2, 4, 3, 5, 6, 7, 8, 9, 10"
Upvotes: 0
Reputation: 93813
regmatches
as yet another alternative:
a <- "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
fn <- ","
rp <- "\n"
n <- 4
regmatches(a, gregexpr(fn, a)) <- list(c(rep(fn,n-1),rp))
a
#[1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
As a function:
a <- "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
replN <- function(x, fn, rp, n) {
regmatches(x, gregexpr(fn, x)) <- list(c(rep(fn,n-1),rp))
x
}
replN(a, ",", "\n", 4)
#[1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10
You could even extend this to be vectorised over the replacement argument:
a = "1, 2, 3, 4, 5, 6, 7, 8, 9, 10"
replN <- function(x,fn,rp,n) {
sel <- rep(fn, n*length(rp))
sel[seq_along(rp)*n] <- rp
regmatches(x, gregexpr(fn, x)) <- list(sel)
x
}
replN(a, fn=",", rp=c("1st","2nd"), n=4)
#[1] "1, 2, 3, 41st 5, 6, 7, 82nd 9, 10"
Upvotes: 2
Reputation: 18357
You can replace ((?:\d+, ){3}\d),
with \1\n
You basically capture everything till fourth comma in group1 and comma separately and replace it with \1\n
which replaces matched text with group1 text and newline, giving you the intended results.
gsub("((?:\\d+, ){3}\\d),", "\\1\n", "1, 2, 3, 4, 5, 6, 7, 8, 9, 10")
Prints,
[1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
Edit:
To generalize above solution to any text, we can change \d
to [^,]
gsub("((?:[^,]+, ){3}[^,]+),", "\\1\n", "1, 2, 3, 4, 5, 6, 7, 8, 9, 10")
gsub("((?:[^,]+, ){3}[^,]+),", "\\1\n", "a, bb, ccc, dddd, 500, 600, 700, 800, 900, 1000")
Output,
[1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
[1] "a, bb, ccc, dddd\n 500, 600, 700, 800\n 900, 1000"
Upvotes: 12
Reputation: 61154
regex is the best alternative, nontheless here's another approach without regex
> str_vec <- strsplit(a, " ")[[1]]
> where <- seq_along(str_vec) %% 4 == 0
> str_vec[where] <- sub(",", "\n", str_vec[where])
> paste(str_vec, collapse=" ")
[1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
Upvotes: 1
Reputation: 922
Using both regex
and gsub
.
a = paste(1:10,collapse=", ")
x <- gsub("([^,]*,[^,]*,[^,]*,[^,]*),", '\\1\n', a)
x
#> [1] "1, 2, 3, 4\n 5, 6, 7, 8\n 9, 10"
Upvotes: 1