Reputation: 795
I have the matrix below:
mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1", "2", "3", "4", "5", "6"))
I need to aggregate columns using a moving window method. First, the window size will be 2, such that the window is comprised of 2 columns. Row sums are taken for this aggregate. The window will shift by one step and again take row sums. For the example data frame provided, the first columns to be aggregated are columns 1&2, the second window will combine column 2&3, then 3&4, then 4&5 and 5&6.
These results (row sums for each aggregate) are put into a matrix. In this matrix rows are conserved and columns now represent the results for each aggregate.
Next, the moving window size will increase to a size of 3. Such that 3 columns of data are combined (summed). Similarly, the window shifts 1 step. For the example data frame provided, the first columns to be aggregated are columns 1-2-3, the second window will combine columns 2-3-4, then 3-4-5, 4-5-6. Results are put into a separate matrix.
The size of the moving window will continue to increase until the window is the size of all columns. In this example, the largest window combines all 6 plots.
Below are result matrices for window sizes 2 and 3 given the example matrix above mat
. Columns are named according to the columns that were added.
#Window length =2
mat1<- matrix( c(3,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,
2,0,1,1,2,0,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,
0,1,0,0,1,1,0,1,0,1,1,0,0,1,0,1,
1,1,0,0,1,0,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat1)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2", "2_3", "3_4", "4_5", "5_6"))
#Window length 3
mat8<- matrix( c(3,0,1,1,2,0,1,0,0,0,0,0,0,0,3,0,
2,1,1,1,2,1,0,0,0,0,0,0,0,0,1,0,
0,1,1,1,2,1,0,1,0,1,1,0,0,1,0,1,
1,2,0,0,1,1,0,1,1,1,2,2,1,1,0,1), nrow=16)
dimnames(mat8)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
c("1_2_3", "2_3_4", "3_4_5", "4_5_6"))
In my example I have 6 columns, so there would be 5 result matrices total. In the event I had 600 columns of data, I am thinking a loop is the most efficient way to iterate over a large dataset.
Upvotes: 0
Views: 184
Reputation: 389175
Here is one way in base R
lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind,
lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))
#[[1]]
# [,1] [,2] [,3] [,4] [,5]
#a 3 2 0 0 1
#c 0 0 1 1 1
#f 0 1 1 0 0
#h 0 1 1 0 0
#i 1 2 1 1 1
#j 0 0 1 1 0
#l 1 0 0 0 0
#m 0 0 0 1 1
#p 0 0 0 0 1
#q 0 0 0 1 1
#s 0 0 0 1 2
#t 0 0 0 0 2
#u 0 0 0 0 1
#v 0 0 0 1 1
#x 3 1 0 0 0
#z 0 0 0 1 1
#[[2]]
# [,1] [,2] [,3] [,4]
#a 3 2 0 1
#c 0 1 1 2
#f 1 1 1 0
#h 1 1 1 0
#i 2 2 2 1
#j 0 1 1 1
#l 1 0 0 0
#m 0 0 1 1
#p 0 0 0 1
#q 0 0 1 1
#s 0 0 1 2
#t 0 0 0 2
#u 0 0 0 1
#v 0 0 1 1
#x 3 1 0 0
#z 0 0 1 1
#....
As this is a rolling operation, we can also use rollapply
from zoo
with a variable window-width
lapply(2:ncol(mat), function(j)
t(zoo::rollapply(seq_len(ncol(mat)), j, function(x) rowSums(mat[,x]))))
Upvotes: 2