Reputation: 1074
I need to split a string every five words (or so) in R. Given input:
x <- c("one, two, three, four, five, six, seven, eight, nine, ten")
I want output:
[1] "one, two, three, four, five"
[2] "six, seven, eight, nine, ten"
Is there a regex or function to accomplish this?
Upvotes: 2
Views: 2408
Reputation: 39154
Here is one possible approach. We can split the string into words. After that, calculate the number of groups and then use tapply
and toString
to generate the output.
x <- c("one, two, three, four, five, six, seven, eight, nine, ten")
# Split the string
y <- strsplit(x, split = ", ")[[1]]
# Know how many groups by 5
group_num <- length(y) %/% 5
# Know how many words are left
group_last <- length(y) %% 5
# Generate the output
z <- tapply(y, c(rep(1:group_num, each = 5),
rep(group_num + 1, times = group_last)),
toString)
z
1 2
"one, two, three, four, five" "six, seven, eight, nine, ten"
Notice that this solution will work even the number of words is not a multiple of 5. The following is an example.
x <- c("one, two, three, four, five, six, seven, eight, nine")
# Split the string
y <- strsplit(x, split = ", ")[[1]]
# Know how many groups by 5
group_num <- length(y) %/% 5
# Know how many words are left
group_last <- length(y) %% 5
# Generate the output
z <- tapply(y, c(rep(1:group_num, each = 5),
rep(group_num + 1, times = group_last)),
toString)
z
1 2
"one, two, three, four, five" "six, seven, eight, nine"
Upvotes: 3
Reputation: 13581
An alternative approach that searches for every fifth instance of the pattern ,
, mutates it to arbitrary character, then splits the string on the arbitrary character
x <- c("one, two, three, four, five, six, seven, eight, nine, ten")
library(stringr)
pattern <- ","
index <- as.data.frame(str_locate_all(x, pattern)) # find all positions of pattern
index <- index[seq(numobs, nrow(index), by=numobs),]$start # filter to every fifth instance of pattern
stopifnot(grepl("!", x)==FALSE) # throws error in case arbitrary symbol to split on is already present
str_sub(x, index, index) <- "!" # arbitrary symbol to split on
ans <- unlist(strsplit(x, "! ")) # split on new symbol
# [1] "one, two, three, four, five"
# [2] "six, seven, eight, nine, ten"
Upvotes: 0
Reputation: 16089
Here's a function that will work for single-length x
.
x <- c("one, two, three, four, five, six, seven, eight, nine, ten")
#' @param x Vector
#' @param n Number of elements in each vector
#' @param pattern Pattern to split on
#' @param ... Passed to strsplit
#' @param collapse String to collapse the result into
split_every <- function(x, n, pattern, collapse = pattern, ...) {
x_split <- strsplit(x, pattern, perl = TRUE, ...)[[1]]
out <- character(ceiling(length(x_split) / n))
for (i in seq_along(out)) {
entry <- x_split[seq((i - 1) * n + 1, i * n, by = 1)]
out[i] <- paste0(entry[!is.na(entry)], collapse = collapse)
}
out
}
library(testthat)
expect_equal(split_every(x, 5, pattern = ", "),
c("one, two, three, four, five",
"six, seven, eight, nine, ten"))
Upvotes: 3
Reputation: 5068
Were you after something like this:
lapply(1:ceiling(length(x)/5), function(i) x[(5*(i-1)+1):min(length(x),(5*i))])
i.e. you don't know the length of your vector x
in advance, but you want to be able to deal with any eventuality?
Upvotes: 0