Danielle
Danielle

Reputation: 795

Moving window method to aggregate data

I have the matrix below:

 mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
       2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
       0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
       0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
       0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
       1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
 dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1", "2", "3", "4", "5", "6"))

I need to aggregate columns using a moving window method. First, the window size will be 2, such that the window is comprised of 2 columns. Row sums are taken for this aggregate. The window will shift by one step and again take row sums. For the example data frame provided, the first columns to be aggregated are columns 1&2, the second window will combine column 2&3, then 3&4, then 4&5 and 5&6.

These results (row sums for each aggregate) are put into a matrix. In this matrix rows are conserved and columns now represent the results for each aggregate.

Next, the moving window size will increase to a size of 3. Such that 3 columns of data are combined (summed). Similarly, the window shifts 1 step. For the example data frame provided, the first columns to be aggregated are columns 1-2-3, the second window will combine columns 2-3-4, then 3-4-5, 4-5-6. Results are put into a separate matrix.

The size of the moving window will continue to increase until the window is the size of all columns. In this example, the largest window combines all 6 plots.

Below are result matrices for window sizes 2 and 3 given the example matrix above mat. Columns are named according to the columns that were added.

#Window length =2 
mat1<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
         2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
         0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
         1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2", "2_3", "3_4", "4_5", "5_6"))

 #Window length 3
 mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
         2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
         0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
         1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
 dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
              c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))

In my example I have 6 columns, so there would be 5 result matrices total. In the event I had 600 columns of data, I am thinking a loop is the most efficient way to iterate over a large dataset.

Upvotes: 0

Views: 184

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389175

Here is one way in base R

lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, 
   lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))


#[[1]]
#  [,1] [,2] [,3] [,4] [,5]
#a    3    2    0    0    1
#c    0    0    1    1    1
#f    0    1    1    0    0
#h    0    1    1    0    0
#i    1    2    1    1    1
#j    0    0    1    1    0
#l    1    0    0    0    0
#m    0    0    0    1    1
#p    0    0    0    0    1
#q    0    0    0    1    1
#s    0    0    0    1    2
#t    0    0    0    0    2
#u    0    0    0    0    1
#v    0    0    0    1    1
#x    3    1    0    0    0
#z    0    0    0    1    1

#[[2]]
#  [,1] [,2] [,3] [,4]
#a    3    2    0    1
#c    0    1    1    2
#f    1    1    1    0
#h    1    1    1    0
#i    2    2    2    1
#j    0    1    1    1
#l    1    0    0    0
#m    0    0    1    1
#p    0    0    0    1
#q    0    0    1    1
#s    0    0    1    2
#t    0    0    0    2
#u    0    0    0    1
#v    0    0    1    1
#x    3    1    0    0
#z    0    0    1    1
#....

As this is a rolling operation, we can also use rollapply from zoo with a variable window-width

lapply(2:ncol(mat), function(j)
    t(zoo::rollapply(seq_len(ncol(mat)), j, function(x) rowSums(mat[,x]))))

Upvotes: 2

Related Questions