Reputation: 33
I meet a problem in multi array average, For example, I have a three dimensional 4*4*3 array x
,
x
, , 1
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 0.5 NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
, , 2
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 0.7 NA NA NA
[3,] 0.4 NA NA NA
[4,] NA NA NA NA
, , 3
[,1] [,2] [,3] [,4]
[1,] NA NA 0.8 NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
what I want to get is the sum without NA, the average it by the numbers of elements non-NA:
basically, the result is like this
[,1] [,2] [,3] [,4]
[1,] 0 0 0.8 0
[2,] 0.6 0 0 0
[3,] 0.4 0 0 0
[4,] 0 0 0 0
In matlab I do it like this nansum(x, 3)./sum(~isnan(x), 3)
,
I tried a lot in R, like apply(x, 3, sum, na.rm = T)
or Reduce
, try to first get the preliminary result
[,1] [,2] [,3] [,4]
[1,] 0 0 0.8 0
[2,] 1.2 0 0 0
[3,] 0.4 0 0 0
[4,] 0 0 0 0
but I still did not manage it, does some one have any hits?
Upvotes: 3
Views: 207
Reputation: 61164
Maybe this could be useful
# Creating your array, I know this is an ugly way to do it :D
Array <- array(rep(NA, 16*3), dim=c(4,4,3))
Array[2,1,1] <- 0.5
Array[2:3,1,2] <- c(0.7,0.4)
Array[1,3,3] <-0.8
Array # this is your array, (Array is not is a very original name)
, , 1
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 0.5 NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
, , 2
[,1] [,2] [,3] [,4]
[1,] NA NA NA NA
[2,] 0.7 NA NA NA
[3,] 0.4 NA NA NA
[4,] NA NA NA NA
, , 3
[,1] [,2] [,3] [,4]
[1,] NA NA 0.8 NA
[2,] NA NA NA NA
[3,] NA NA NA NA
[4,] NA NA NA NA
# one way to get what you want could be...
(result <- apply(Array, c(1,2), mean, na.rm=TRUE))
[,1] [,2] [,3] [,4]
[1,] NaN NaN 0.8 NaN
[2,] 0.6 NaN NaN NaN
[3,] 0.4 NaN NaN NaN
[4,] NaN NaN NaN NaN
# if you want zeroes instead of NaN as your desired output example shows...
result[is.nan(result)] <- 0
result
[,1] [,2] [,3] [,4]
[1,] 0.0 0 0.8 0
[2,] 0.6 0 0.0 0
[3,] 0.4 0 0.0 0
[4,] 0.0 0 0.0 0
Upvotes: 3
Reputation: 179448
You are on the right track with using apply
and na.rm=TRUE
. You simply need to specify multiple dimensions to apply over, using the argument MARGIN=c(..., ...)
.
Here is an example using the built-in dataset Titanic
:
str(Titanic)
table [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
- attr(*, "dimnames")=List of 4
..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
..$ Sex : chr [1:2] "Male" "Female"
..$ Age : chr [1:2] "Child" "Adult"
..$ Survived: chr [1:2] "No" "Yes"
Now sum over the 3rd and 4th dimensions:
apply(Titanic, c(3, 4), sum, na.rm=TRUE)
Survived
Age No Yes
Child 52 57
Adult 1438 654
Upvotes: 4
Reputation: 14433
Maybe something like this:
apply(x, c(1,2), sum, na.rm=TRUE)
Note, this is untested, due to the lack of a reproducible dataset.
Upvotes: 3