Derek
Derek

Reputation: 33

multi array averaging without NA in R

I meet a problem in multi array average, For example, I have a three dimensional 4*4*3 array x,

x
 , , 1

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]  0.5   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA

, , 2

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]  0.7   NA   NA   NA
[3,]  0.4   NA   NA   NA
[4,]   NA   NA   NA   NA

, , 3

     [,1] [,2] [,3] [,4]
[1,]   NA   NA  0.8   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA

what I want to get is the sum without NA, the average it by the numbers of elements non-NA:

basically, the result is like this

     [,1] [,2] [,3] [,4]

[1,]   0   0  0.8   0
[2,]   0.6  0  0   0
[3,]   0.4  0   0   0
[4,]   0   0   0   0

In matlab I do it like this nansum(x, 3)./sum(~isnan(x), 3), I tried a lot in R, like apply(x, 3, sum, na.rm = T) or Reduce , try to first get the preliminary result

     [,1] [,2] [,3] [,4]

[1,]     0   0  0.8   0
[2,]   1.2   0    0   0
[3,]   0.4   0    0   0
[4,]     0   0    0   0

but I still did not manage it, does some one have any hits?

Upvotes: 3

Views: 207

Answers (3)

Jilber Urbina
Jilber Urbina

Reputation: 61164

Maybe this could be useful

 # Creating your array, I know this is an ugly way to do it :D
 Array <- array(rep(NA, 16*3), dim=c(4,4,3))
 Array[2,1,1] <- 0.5
 Array[2:3,1,2] <- c(0.7,0.4)
 Array[1,3,3] <-0.8
 Array # this is your array, (Array is not is a very original name)
, , 1

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]  0.5   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA

, , 2

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]  0.7   NA   NA   NA
[3,]  0.4   NA   NA   NA
[4,]   NA   NA   NA   NA

, , 3

     [,1] [,2] [,3] [,4]
[1,]   NA   NA  0.8   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA
[4,]   NA   NA   NA   NA


 # one way to get what you want could be...
 (result <- apply(Array, c(1,2), mean, na.rm=TRUE))
     [,1] [,2] [,3] [,4]
[1,]  NaN  NaN  0.8  NaN
[2,]  0.6  NaN  NaN  NaN
[3,]  0.4  NaN  NaN  NaN
[4,]  NaN  NaN  NaN  NaN

 # if you want zeroes instead of NaN as your desired output example shows...
 result[is.nan(result)] <- 0

 result
     [,1] [,2] [,3] [,4]
[1,]  0.0    0  0.8    0
[2,]  0.6    0  0.0    0
[3,]  0.4    0  0.0    0
[4,]  0.0    0  0.0    0

Upvotes: 3

Andrie
Andrie

Reputation: 179448

You are on the right track with using apply and na.rm=TRUE. You simply need to specify multiple dimensions to apply over, using the argument MARGIN=c(..., ...).

Here is an example using the built-in dataset Titanic:

str(Titanic)
 table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
 - attr(*, "dimnames")=List of 4
  ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
  ..$ Sex     : chr [1:2] "Male" "Female"
  ..$ Age     : chr [1:2] "Child" "Adult"
  ..$ Survived: chr [1:2] "No" "Yes"

Now sum over the 3rd and 4th dimensions:

apply(Titanic, c(3, 4), sum, na.rm=TRUE)
       Survived
Age       No Yes
  Child   52  57
  Adult 1438 654

Upvotes: 4

johannes
johannes

Reputation: 14433

Maybe something like this:

apply(x, c(1,2), sum, na.rm=TRUE)

Note, this is untested, due to the lack of a reproducible dataset.

Upvotes: 3

Related Questions