Reputation: 175
Let's say I have the data frame below. How can I apply some arbitrary function to all pairwise combinations of rows in a subset of the data frame? For example, how can I compute the averages of every combination of rows labelled (alpha, beta, gamma) only for columns labelled "red.."? As this is just an example, there are only 3 pairwise combinations of numbers for which I wish to find the average: average of (1, 0, 6) & (7, 2, 10), of (1, 0, 6) & (6, 3, 11) and of (7, 2, 10) & (6, 3, 11). But I am looking for code that can be applied to many more rows/columns in a particular subset, that could be 100s of pairwise combinations (without duplicates/ repeated combinations). Thanks!
df <- read.csv("test.csv", row.names = 1, header = TRUE)
df
red.1 red.2 red.3 yellow.1 yellow.2
alpha 1 0 6 56 59
beta 7 2 10 59 64
gamma 6 3 11 100 105
pi 1009 2104 290 6 5
Upvotes: 0
Views: 311
Reputation: 388982
You can write a function which selects row based on rownames and column based on the pattern in their names, splits them row-wise create all possible combinations of them taking 2 at a time and calculate mean
of each combination.
get_average <- function(data, row, col_pattern) {
combn(asplit(data[row, grep(col_pattern, names(data))], 1), 2,
function(x) mean(unlist(x)))
}
We can then pass rownames and pattern of column names to this function.
get_average(df, c('alpha', 'beta', 'gamma'), 'red')
#[1] 4.3 4.5 6.5
To get corelation between each combination we can do :
PCC <- function(data, row, col_pattern) {
combn(asplit(data[row, grep(col_pattern, names(data))], 1), 2,
function(x) cor(x[[1]], x[[2]]))
}
PCC(df, c('alpha', 'beta', 'gamma'), 'red')
#[1] 0.87 0.98 0.96
data
df <- structure(list(red.1 = c(1L, 7L, 6L, 1009L), red.2 = c(0L, 2L,
3L, 2104L), red.3 = c(6L, 10L, 11L, 290L), yellow.1 = c(56L,
59L, 100L, 6L), yellow.2 = c(59L, 64L, 105L, 5L)), class =
"data.frame", row.names = c("alpha", "beta", "gamma", "pi"))
Upvotes: 1