moth
moth

Reputation: 2389

mean for matrices within a matrix in base R or dplyr

Consider the following matrix:

  tt <-  structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785, 
NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722, 
235.306183514161, NA, NA, NA, NA, NA, NA, NA, 234.959570990415, 
209.098063118719, 218.561204242656, 222.512920973143, NA, NA, 
NA, NA, NA, NA, 208.300264042079, 215.937490955137, 237.957979483774, 
192.688868386319, 235.076583265965, NA, NA, NA, NA, NA, 206.523606398881, 
223.937491278258, 223.926327170344, 214.32218737219, 226.512692801088, 
201.218786399282, NA, NA, NA, NA, 224.281073655358, 213.943917885038, 
238.593797069413, 203.435493461687, 229.752040252094, 219.155196151038, 
218.091723822799, NA, NA, NA, 220.671701855947, 201.380237362061, 
232.187424293393, 191.10206696946, 234.448288541418, 178.759615126012, 
214.037379912949, 204.514058196497, NA, NA, 232.924880594581, 
229.573517636508, 197.886331008486, 231.900840878165, 221.634834807167, 
227.927620090238, 232.886238322491, 239.428486191598, 231.987068605127, 
NA), .Dim = c(10L, 10L), .Dimnames = list(c("SA1", "SA1", "SA1", 
"SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2"), c("SA1", "SA1", 
"SA1", "SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2")))

It looks like that:

   SA1      SA1      SA1      SA1      SA2      SA2      SA2      SA2      SA2      SA2
SA1  NA 223.2622 233.3174 213.9766 234.9596 208.3003 206.5236 224.2811 220.6717 232.9249
SA1  NA       NA 228.2301 202.4204 209.0981 215.9375 223.9375 213.9439 201.3802 229.5735
SA1  NA       NA       NA 235.3062 218.5612 237.9580 223.9263 238.5938 232.1874 197.8863
SA1  NA       NA       NA       NA 222.5129 192.6889 214.3222 203.4355 191.1021 231.9008
SA2  NA       NA       NA       NA       NA 235.0766 226.5127 229.7520 234.4483 221.6348
SA2  NA       NA       NA       NA       NA       NA 201.2188 219.1552 178.7596 227.9276
SA2  NA       NA       NA       NA       NA       NA       NA 218.0917 214.0374 232.8862
SA2  NA       NA       NA       NA       NA       NA       NA       NA 204.5141 239.4285
SA2  NA       NA       NA       NA       NA       NA       NA       NA       NA 231.9871
SA2  NA       NA       NA       NA       NA       NA       NA       NA       NA       NA

I would like to calculate the mean for SA1 and SA2 sub matrices. By sub_matrices I mean only SA1 equal rownames and columnames and also only SA2 equal rownames and column names. For SA1 this would be like mean(tt[1:4,1:4],na.rm=T), however my real matrix is much bigger than this example so basic sub setting is not a solution but rather some sort of grouping by distinct row.names and colnames. If someone could show me a solution in both base R and dplyr would be awesome.

Upvotes: 2

Views: 55

Answers (3)

akrun
akrun

Reputation: 887193

An option with tidyverse. We can melt the 'tt' into 'long' format. Filter the rows where the row names and column names are same, then grouped by 'Var1', get the mean of 'value' column

library(dplyr)
library(reshape2)
melt(tt) %>% 
   filter(Var1 == Var2) %>%
   group_by(Var1) %>%
   summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
#  Var1  value
#  <fct> <dbl>
#1 SA1    223.
#2 SA2    221.

Upvotes: 1

rg255
rg255

Reputation: 4169

This makes a vector called sub_list which starts out as a vector of the unique column names, then iterating through the subsets, the names are replaced by the means (you could output them to another vector but why make two when one will suffice?)

sub_list <- unique(colnames(tt))

for(j in 1:length(sub_list)){
  sub_list[j] <- mean(tt[,colnames(tt) == sub_list[j]], na.rm =  TRUE)
}

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388992

We could loop over all the unique column names of the matrix using sapply, subset them and take mean of each sub-matrix.

sapply(unique(colnames(tt)), function(x) 
     mean(tt[rownames(tt) == x, colnames(tt) == x], na.rm = TRUE))

#  SA1   SA2 
#222.8 221.0 

Upvotes: 2

Related Questions