Reputation: 366
I have what I thought would be a simple problem, but I haven't been able to find an appropriate answer. I have a multidimensional array v[x,y,z]
and I would like to apply a function to the array along the z dimension using a grouping variable (group). Here is an example (in R):
v<-1:81
dim(v)<-c(3,3,9)
group<-c('a','a','a','b','b','b','c','c','c')
Given that the grouping variable has 3 levels (a, b and c), the result (out) I'm looking for is an array of dimension 3x3x3. I can obtain out using the following code for the above example:
out1<-apply(v[,,c(1:3)],c(1,2),sum)
out2<-apply(v[,,c(4:6)],c(1,2),sum)
out3<-apply(v[,,c(7:9)],c(1,2),sum)
library(abind)
out<-abind(out1, out2, out3, along=3)
My question is if there is a a general means of obtaining the above result, which can be applied to large dimensional arrays and long grouping vectors.
Upvotes: 9
Views: 1902
Reputation: 39747
split
by the groups and go over the groups with lapply. Use the index to subset the array and use sum
in apply
. Simplify the list to an array with simplify2array
.
x <- simplify2array( lapply(split(seq_along(group), group), \(i)
apply(v[,,i], 1:2, sum)) )
all.equal(x, out, check.attributes = FALSE)
#[1] TRUE
In this case rowSums
could also be used.
x <- simplify2array( lapply(split(seq_along(group), group), \(i)
rowSums(v[,,i], dim=2)) )
Another way would be to use tapply
inside apply
where the order of the dimensions need to be reordered with aperm
x <- apply(v, 1:2, tapply, group, sum)
all.equal(aperm(x, c(2,3,1)), out, check.attributes = FALSE)
#[1] TRUE
Upvotes: 0
Reputation: 60000
Using the package raster might be more appropriate for your needs. It has some code optimised for handling remotely sensed data, taking care of processing in chunks. Consider this example:
## Make 12 rasters, maybe one for each month of the year
for( i in seq(12) ){
assign( paste0( "r" , i ) , raster( matrix(runif(1e3) , nrow = 1e2 ) ) )
}
## Create a raster stack from these
rS <- stack( mget( paste0("r",1:12) , envir = .GlobalEnv ) )
## Use calc to get mean, using by to group by a variable
## In this example I use the vector (1,1,1,2,2,2,3,3,3,4,4,4)
## meaning I get means for the first 3 rasters, then the next 3 etc
## So I get a mean for each quarter
rMean <- calc( rS , fun = function(x){ by(x , c( rep( 1:4 , each=3 ) ) , mean ) } )
Which returns a raster brick with 4 layers (one mean for each quarter):
class : RasterBrick
dimensions : 100, 10, 1000, 4 (nrow, ncol, ncell, nlayers)
resolution : 0.1, 0.01 (x, y)
extent : 0, 1, 0, 1 (xmin, xmax, ymin, ymax)
coord. ref. : NA
data source : in memory
names : X1, X2, X3, X4
min values : 0.02096586, 0.04015260, 0.04704145, 0.05884161
max values : 0.9727491, 0.9303025, 0.9804486, 0.9934670
I hope you can adapt this to your data.
Upvotes: 5
Reputation: 89107
Easy:
out <- apply(v, c(1, 2), by, group, sum)
But to get the data in exactly the same order as you want:
out <- aperm(apply(v, c(1, 2), by, group, sum), c(2, 3, 1))
Upvotes: 8
Reputation: 25484
This is much easier if your data is formatted as data frame:
library(plyr)
vd <- adply(v, 1:3)
head(vd)
X1 X2 X3 V1
1 1 1 1 1
2 2 1 1 2
3 3 1 1 3
4 1 2 1 4
5 2 2 1 5
6 3 2 1 6
Then, you can simply attach your grouping...
vd$group <- rep(group, rep(3 * 3, length(group)))
...and split according to this grouping:
daply(vd, .(group), function(df) { ... } )
The anonymous function { ... }
will be called once for each group, with df
containing the sub-dataframe corresponding to that group. Here you could recombine and aggregate the data into a matrix using similar machinery. The function should return an array of dimensions 3x3x1, these will be concatenated by daply
to form the desired result.
Upvotes: 2