Reputation: 133
I don't know the proper technical terms for this kind of operation, so it has been difficult to search for existing solutions. I thought I would try to post my own question and hopefully someone can help me out (or point me in the right direction).
I have a vector of characters and I want to collect them in groups of twos and threes. To illustrate, here is a simplified version:
The table I have:
"a" "b" "c" "d" "e" "f"
I want to run through the vector and concatenate groups of two and three elements. This is the end result I want:
"a b" "b c" "c d" "d e" "e f"
And
"a b c" "b c d" "c d e" "d e f"
I solved this the simplest and dirtiest way possible by using for-loops, but it takes a long time to run and I am convinced it can be done more efficiently.
Here is my ghetto-hack:
t1 <- c("a", "b", "c", "d", "e", "f")
t2 <- rep("", length(t1)-1)
for (i in 1:length(t1)-1) {
t2[i] = paste(t1[i], t1[i+1])
}
t3 <- rep("", length(t1)-2)
for (i in 1:length(t1)-2) {
t3[i] = paste(t1[i], t1[i+1], t1[i+2])
}
I was looking into sapply and tapply etc. but I can't seem to figure out how to use "the following element" in the vector.
Any help will be rewarded with my eternal gratitude!
-------------- Edit --------------
Run times of the suggestions using input data with ~ 3 million rows:
START: [1] "2016-11-20 19:24:50 CET" For-loop: [1] "2016-11-20 19:28:26 CET" rollapply: [1] "2016-11-20 19:38:55 CET" apply(matrix): [1] "2016-11-20 19:42:15 CET" paste t1[-length...]: [1] "2016-11-20 19:42:37 CET" grep: [1] "2016-11-20 19:44:30 CET"
Upvotes: 1
Views: 98
Reputation: 2028
Have you considered the zoo package? For example
library('zoo')
input<-c('a','b','c','d','e','f')
output<-rollapply(data=input, width=2, FUN=paste, collapse=" ")
output
will return
"a b" "b c" "c d" "d e" "e f"
The width
argument controls how many elements to concatenate. I expect you'll have improved runtimes here too but I haven't tested
Upvotes: 2
Reputation: 887951
For groups of two, we can do this with
paste(t1[-length(t1)], t1[-1])
#[1] "a b" "b c" "c d" "d e" "e f"
and for higher numbers, one option is shift
from data.table
library(data.table)
v1 <- do.call(paste, shift(t1, 0:2, type="lead"))
grep("NA", v1, invert=TRUE, value=TRUE)
#[1] "a b c" "b c d" "c d e" "d e f"
Or
n <- length(t1)
n1 <- 3
apply(matrix(t1, ncol=n1, nrow = n+1)[seq(n-(n1-1)),], 1, paste, collapse=' ')
Upvotes: 1