Mateusz Dymczyk
Mateusz Dymczyk

Reputation: 15141

Combining rows in a matrix

I have a list of lists and I want to "ngram" them, meaning to each row I want to append N-1 (where N is a number passed as an argument) following rows i.e.:

1 2 3
4 5 6
7 8 9
1 2 3

With n=2 would give me a matrix with only 3 rows (rows - n + 1):

1 2 3 4 5 6 // row1+row2
4 5 6 7 8 9 // row2+row3
7 8 9 1 2 3 // row3+row4

For n=3:

1 2 3 4 5 6 7 8 9 // row1+row2+row3
4 5 6 7 8 9 1 2 3 // row2+row3+row4

For n=4 it would just return 1 row with all rows concatenated, for n>4 it would fail.

I have a fairly straightforward code in R to do this (R newbie here):

ngram <- function(inp, window){
    rows <- dim(inp)[1]
    cols <- dim(inp)[2]
    resRows <- rows - window + 1

    res <- c()

    for(idx in 1:resRows) {
        newRow <- inp[idx,]
        for(ii in 1:(window-1)) {
            newRow <- c(newRow, inp[idx+ii,])
        }
        res <- rbind(res,newRow)
    }
    return(res)
}

iot <- read.csv("resources/data.csv")
iot <- ngram(iot, 5)

The problem, I think, is with the c(newRow, inp[idx+ii,]), which is extremely slow if I put for example n=10. Is there a better way to do what I want to do?

Upvotes: 1

Views: 74

Answers (2)

lmo
lmo

Reputation: 38500

An alternative method uses matrix to build a new matrix from the individual elements.

matSplat <- function(myMat, n) {
  # get a list of the rows to combine
  rows <- lapply(seq_len(nrow(myMat)-(n-1)), function(i) i:(i+n-1))
  # transpose the matrix
  myMat.t <- t(myMat)
  # build up the new matrix
  matrix(unlist(lapply(rows, function(i) myMat.t[,i])), nrow(myMat)-(n-1), byrow=TRUE)
}

This results in

matSplat(myMat, 2)
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6
[2,]    4    5    6    7    8    9
[3,]    7    8    9    1    2    3
matSplat(myMat, 3)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,]    1    2    3    4    5    6    7    8    9
[2,]    4    5    6    7    8    9    1    2    3

data

myMat <- matrix(c(1:9, 1:3), ncol=3, byrow = TRUE)

Upvotes: 2

Pop
Pop

Reputation: 12401

Let's assume you a the following matrix

a <- matrix(1:12, 4, 3, byrow = T)

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
[4,]   10   11   12

You can get what you by using cbind (n=2 as in your example)

cbind(a[1:(nrow(a) - 1),], a[2:nrow(a),])

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    2    3    4    5    6
[2,]    4    5    6    7    8    9
[3,]    7    8    9   10   11   12

If I understand correctly your ngram function, it should be rewritten in this way

ngram <- function(inp, window){
   N <- nrow(inp)
   cbind(inp[1:(N - window + 1),], inp[window:N,])
}

Upvotes: 4

Related Questions