Reputation: 2389
Consider the following matrix:
tt <- structure(c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 223.26217771938,
NA, NA, NA, NA, NA, NA, NA, NA, NA, 233.317380407033, 228.230147000785,
NA, NA, NA, NA, NA, NA, NA, NA, 213.976634238414, 202.420354707722,
235.306183514161, NA, NA, NA, NA, NA, NA, NA, 234.959570990415,
209.098063118719, 218.561204242656, 222.512920973143, NA, NA,
NA, NA, NA, NA, 208.300264042079, 215.937490955137, 237.957979483774,
192.688868386319, 235.076583265965, NA, NA, NA, NA, NA, 206.523606398881,
223.937491278258, 223.926327170344, 214.32218737219, 226.512692801088,
201.218786399282, NA, NA, NA, NA, 224.281073655358, 213.943917885038,
238.593797069413, 203.435493461687, 229.752040252094, 219.155196151038,
218.091723822799, NA, NA, NA, 220.671701855947, 201.380237362061,
232.187424293393, 191.10206696946, 234.448288541418, 178.759615126012,
214.037379912949, 204.514058196497, NA, NA, 232.924880594581,
229.573517636508, 197.886331008486, 231.900840878165, 221.634834807167,
227.927620090238, 232.886238322491, 239.428486191598, 231.987068605127,
NA), .Dim = c(10L, 10L), .Dimnames = list(c("SA1", "SA1", "SA1",
"SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2"), c("SA1", "SA1",
"SA1", "SA1", "SA2", "SA2", "SA2", "SA2", "SA2", "SA2")))
It looks like that:
SA1 SA1 SA1 SA1 SA2 SA2 SA2 SA2 SA2 SA2
SA1 NA 223.2622 233.3174 213.9766 234.9596 208.3003 206.5236 224.2811 220.6717 232.9249
SA1 NA NA 228.2301 202.4204 209.0981 215.9375 223.9375 213.9439 201.3802 229.5735
SA1 NA NA NA 235.3062 218.5612 237.9580 223.9263 238.5938 232.1874 197.8863
SA1 NA NA NA NA 222.5129 192.6889 214.3222 203.4355 191.1021 231.9008
SA2 NA NA NA NA NA 235.0766 226.5127 229.7520 234.4483 221.6348
SA2 NA NA NA NA NA NA 201.2188 219.1552 178.7596 227.9276
SA2 NA NA NA NA NA NA NA 218.0917 214.0374 232.8862
SA2 NA NA NA NA NA NA NA NA 204.5141 239.4285
SA2 NA NA NA NA NA NA NA NA NA 231.9871
SA2 NA NA NA NA NA NA NA NA NA NA
I would like to calculate the mean for SA1 and SA2 sub matrices. By sub_matrices I mean only SA1 equal rownames and columnames and also only SA2 equal rownames and column names. For SA1 this would be like mean(tt[1:4,1:4],na.rm=T)
, however my real matrix is much bigger than this example so basic sub setting is not a solution but rather some sort of grouping by distinct row.names
and colnames
. If someone could show me a solution in both base R and dplyr would be awesome.
Upvotes: 2
Views: 55
Reputation: 887193
An option with tidyverse
. We can melt
the 'tt' into 'long' format. Filter the rows where the row names and column names are same, then grouped by 'Var1', get the mean
of 'value' column
library(dplyr)
library(reshape2)
melt(tt) %>%
filter(Var1 == Var2) %>%
group_by(Var1) %>%
summarise(value = mean(value, na.rm = TRUE))
# A tibble: 2 x 2
# Var1 value
# <fct> <dbl>
#1 SA1 223.
#2 SA2 221.
Upvotes: 1
Reputation: 4169
This makes a vector called sub_list
which starts out as a vector of the unique column names, then iterating through the subsets, the names are replaced by the means (you could output them to another vector but why make two when one will suffice?)
sub_list <- unique(colnames(tt))
for(j in 1:length(sub_list)){
sub_list[j] <- mean(tt[,colnames(tt) == sub_list[j]], na.rm = TRUE)
}
Upvotes: 1
Reputation: 388992
We could loop over all the unique
column names of the matrix using sapply
, subset them and take mean
of each sub-matrix.
sapply(unique(colnames(tt)), function(x)
mean(tt[rownames(tt) == x, colnames(tt) == x], na.rm = TRUE))
# SA1 SA2
#222.8 221.0
Upvotes: 2