Reputation: 15141
I have a list of lists and I want to "ngram" them, meaning to each row I want to append N-1 (where N is a number passed as an argument) following rows i.e.:
1 2 3
4 5 6
7 8 9
1 2 3
With n=2 would give me a matrix with only 3 rows (rows - n + 1):
1 2 3 4 5 6 // row1+row2
4 5 6 7 8 9 // row2+row3
7 8 9 1 2 3 // row3+row4
For n=3:
1 2 3 4 5 6 7 8 9 // row1+row2+row3
4 5 6 7 8 9 1 2 3 // row2+row3+row4
For n=4 it would just return 1 row with all rows concatenated, for n>4 it would fail.
I have a fairly straightforward code in R to do this (R newbie here):
ngram <- function(inp, window){
rows <- dim(inp)[1]
cols <- dim(inp)[2]
resRows <- rows - window + 1
res <- c()
for(idx in 1:resRows) {
newRow <- inp[idx,]
for(ii in 1:(window-1)) {
newRow <- c(newRow, inp[idx+ii,])
}
res <- rbind(res,newRow)
}
return(res)
}
iot <- read.csv("resources/data.csv")
iot <- ngram(iot, 5)
The problem, I think, is with the c(newRow, inp[idx+ii,])
, which is extremely slow if I put for example n=10
. Is there a better way to do what I want to do?
Upvotes: 1
Views: 74
Reputation: 38500
An alternative method uses matrix
to build a new matrix from the individual elements.
matSplat <- function(myMat, n) {
# get a list of the rows to combine
rows <- lapply(seq_len(nrow(myMat)-(n-1)), function(i) i:(i+n-1))
# transpose the matrix
myMat.t <- t(myMat)
# build up the new matrix
matrix(unlist(lapply(rows, function(i) myMat.t[,i])), nrow(myMat)-(n-1), byrow=TRUE)
}
This results in
matSplat(myMat, 2)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 4 5 6 7 8 9
[3,] 7 8 9 1 2 3
matSplat(myMat, 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 2 3 4 5 6 7 8 9
[2,] 4 5 6 7 8 9 1 2 3
data
myMat <- matrix(c(1:9, 1:3), ncol=3, byrow = TRUE)
Upvotes: 2
Reputation: 12401
Let's assume you a the following matrix
a <- matrix(1:12, 4, 3, byrow = T)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
You can get what you by using cbind
(n=2 as in your example)
cbind(a[1:(nrow(a) - 1),], a[2:nrow(a),])
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 4 5 6 7 8 9
[3,] 7 8 9 10 11 12
If I understand correctly your ngram function, it should be rewritten in this way
ngram <- function(inp, window){
N <- nrow(inp)
cbind(inp[1:(N - window + 1),], inp[window:N,])
}
Upvotes: 4