Applying a function to all pairwise combination of rows in a subset of a data frame

Question

Let's say I have the data frame below. How can I apply some arbitrary function to all pairwise combinations of rows in a subset of the data frame? For example, how can I compute the averages of every combination of rows labelled (alpha, beta, gamma) only for columns labelled "red.."? As this is just an example, there are only 3 pairwise combinations of numbers for which I wish to find the average: average of (1, 0, 6) & (7, 2, 10), of (1, 0, 6) & (6, 3, 11) and of (7, 2, 10) & (6, 3, 11). But I am looking for code that can be applied to many more rows/columns in a particular subset, that could be 100s of pairwise combinations (without duplicates/ repeated combinations). Thanks!

df <- read.csv("test.csv", row.names = 1, header = TRUE)
df
      red.1 red.2 red.3 yellow.1 yellow.2
alpha     1     0     6       56       59
beta      7     2    10       59       64
gamma     6     3    11      100      105
pi     1009  2104   290        6        5

Ronak Shah · Accepted Answer

You can write a function which selects row based on rownames and column based on the pattern in their names, splits them row-wise create all possible combinations of them taking 2 at a time and calculate mean of each combination.

get_average <- function(data, row, col_pattern) {
  combn(asplit(data[row, grep(col_pattern, names(data))], 1), 2, 
               function(x) mean(unlist(x)))
}

We can then pass rownames and pattern of column names to this function.

get_average(df, c('alpha', 'beta', 'gamma'), 'red')
#[1] 4.3 4.5 6.5

To get corelation between each combination we can do :

PCC <- function(data, row, col_pattern) {
   combn(asplit(data[row, grep(col_pattern, names(data))], 1), 2,
          function(x) cor(x[[1]], x[[2]]))
}

PCC(df, c('alpha', 'beta', 'gamma'), 'red')
#[1] 0.87 0.98 0.96

data

df <- structure(list(red.1 = c(1L, 7L, 6L, 1009L), red.2 = c(0L, 2L, 
3L, 2104L), red.3 = c(6L, 10L, 11L, 290L), yellow.1 = c(56L, 
59L, 100L, 6L), yellow.2 = c(59L, 64L, 105L, 5L)), class = 
"data.frame", row.names = c("alpha", "beta", "gamma", "pi"))

Applying a function to all pairwise combination of rows in a subset of a data frame

Answers (1)

Related Questions