spiral01
spiral01

Reputation: 545

R: applying a function that returns a list across multiple columns of a data frame

I am trying to apply a function that takes two inputs to every combination of this list:

> c('EAS_MAF', 'AMR_MAF', 'AFR_MAF', 'EUR_MAF', 'SAS_MAF')
[1] "EAS_MAF" "AMR_MAF" "AFR_MAF" "EUR_MAF" "SAS_MAF"

To arrange the values in each combination of 2 I am using the combn function:

> list <- combn(c('EAS_MAF', 'AMR_MAF', 'AFR_MAF', 'EUR_MAF', 'SAS_MAF'),2)
> list
     [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]      [,8]      [,9]      [,10]    
[1,] "EAS_MAF" "EAS_MAF" "EAS_MAF" "EAS_MAF" "AMR_MAF" "AMR_MAF" "AMR_MAF" "AFR_MAF" "AFR_MAF" "EUR_MAF"
[2,] "AMR_MAF" "AFR_MAF" "EUR_MAF" "SAS_MAF" "AFR_MAF" "EUR_MAF" "SAS_MAF" "EUR_MAF" "SAS_MAF" "SAS_MAF"

The function itself calculates the number of rows that meet a certain criteria and returns a list:

sharedCalc.func <- function(pop1, pop2, table = variantTable){
  S.count = sum(table[pop1]>0 & table[pop2]>0 & 
                  table['consequence'] == 'synonymous SNV')
  NS.count = sum(table[pop1]>0 & table[pop2]>0 & 
                  table['consequence'] != 'synonymous SNV')
  counts <- list("NS" = NS.count, "S" = S.count, "NS/S" = NS.count/S.count)
  return(counts)
}

Here is an example output from this function:

> sharedCalc.func('EAS_MAF', 'AMR_MAF')
$NS
[1] 59325

$S
[1] 43434

$`NS/S`
[1] 1.365865

To run this function across my list I assumed the apply function would be most appropriate. However this returns a non-conformable arrays error:

> apply(list, 2, sharedCalc.func)
Error in FUN(newX[, i], ...) : binary operation on non-conformable arrays

I also tried the outer function and received the same error:

> outer(list[1,], list[2,], sharedCalc.func)
Error in FUN(X, Y, ...) : binary operation on non-conformable arrays

I am not sure why I am getting the error. Is it possibly due to returning a list from the function? I have tried using lapply to return a list but this does not work either. Below is the dput of my data:

> dput(head(variantTable))
structure(list(CHROM = c("1", "1", "1", "1", "1", "1"), POS = c(69224L, 
69428L, 69486L, 69487L, 69496L, 69521L), ID = c("rs568964432", 
"rs140739101", "rs548369610", "rs568226429", "rs150690004", "rs553724620"
), REF = c("A", "T", "C", "G", "G", "T"), ALT = c("T", "G", "T", 
"A", "A", "A"), AF = c(0.000399361, 0.0189696, 0.000199681, 0.000399361, 
0.000998403, 0.000399361), AC = c(2L, 95L, 1L, 2L, 5L, 2L), AN = c(5008L, 
5008L, 5008L, 5008L, 5008L, 5008L), consequence = c("nonsynonymous SNV", 
"nonsynonymous SNV", "synonymous SNV", "nonsynonymous SNV", "nonsynonymous SNV", 
"nonsynonymous SNV"), gene = c("OR4F5", "OR4F5", "OR4F5", "OR4F5", 
"OR4F5", "OR4F5"), refGene_id = c("NM_001005484", "NM_001005484", 
"NM_001005484", "NM_001005484", "NM_001005484", "NM_001005484"
), AA_change = c("('D', 'V')", "('F', 'C')", "('N', 'N')", "('A', 'T')", 
"('G', 'S')", "('I', 'N')"), X0.fold_count = c(572L, 572L, 572L, 
572L, 572L, 572L), X4.fold_count = c(141L, 141L, 141L, 141L, 
141L, 141L), EAS_MAF = c(0, 0.003, 0.001, 0, 0, 0), AMR_MAF = c(0.0029, 
0.036, 0, 0, 0.0014, 0.0029), AFR_MAF = c(0, 0.0015, 0, 0.0015, 
0.003, 0), EUR_MAF = c(0, 0.0497, 0, 0, 0, 0), SAS_MAF = c(0, 
0.0153, 0, 0, 0, 0), nonAFR_N = c(309227L, 1128036L, 262551L, 
0L, 309227L, 309227L), nonAFR_weighted = c(0.0029, 0.0261704282487438, 
0.001, 0, 0.0014, 0.0029)), .Names = c("CHROM", "POS", "ID", 
"REF", "ALT", "AF", "AC", "AN", "consequence", "gene", "refGene_id", 
"AA_change", "X0.fold_count", "X4.fold_count", "EAS_MAF", "AMR_MAF", 
"AFR_MAF", "EUR_MAF", "SAS_MAF", "nonAFR_N", "nonAFR_weighted"
), row.names = c(NA, 6L), class = "data.frame")

Upvotes: 2

Views: 299

Answers (2)

coffeinjunky
coffeinjunky

Reputation: 11514

Try the following:

l <- combn(c('EAS_MAF', 'AMR_MAF', 'AFR_MAF', 'EUR_MAF', 'SAS_MAF'),2)
l
     [,1]      [,2]      [,3]      [,4]      [,5]      [,6]     
[1,] "EAS_MAF" "EAS_MAF" "EAS_MAF" "EAS_MAF" "AMR_MAF" "AMR_MAF"
[2,] "AMR_MAF" "AFR_MAF" "EUR_MAF" "SAS_MAF" "AFR_MAF" "EUR_MAF"
     [,7]      [,8]      [,9]      [,10]    
[1,] "AMR_MAF" "AFR_MAF" "AFR_MAF" "EUR_MAF"
[2,] "SAS_MAF" "EUR_MAF" "SAS_MAF" "SAS_MAF"

mapply(sharedCalc.func, l[1,], l[2,])
     EAS_MAF EAS_MAF EAS_MAF EAS_MAF AMR_MAF AMR_MAF AMR_MAF AFR_MAF
NS   1       1       1       1       2       1       1       1      
S    0       0       0       0       0       0       0       0      
NS/S Inf     Inf     Inf     Inf     Inf     Inf     Inf     Inf    
     AFR_MAF EUR_MAF
NS   1       1      
S    0       0      
NS/S Inf     Inf    

mapply is the multivariate version of sapply, to be used if you want to traverse multiple lists simultaneously.

As a side-remark: it is really almost always a bad idea to overwrite built-in R functionality with your own objects. So, calling an object list is a bad idea, which is why I changed it to l in the above code.


To keep the column names, one could do something like this:

out <- mapply(sharedCalc.func, l[1,], l[2,])
setNames(data.frame(out), mapply(paste, l[1,], l[2,], sep="-"))
     EAS_MAF-AMR_MAF EAS_MAF-AFR_MAF EAS_MAF-EUR_MAF EAS_MAF-SAS_MAF
NS                 1               1               1               1
S                  0               0               0               0
NS/S             Inf             Inf             Inf             Inf
     AMR_MAF-AFR_MAF AMR_MAF-EUR_MAF AMR_MAF-SAS_MAF AFR_MAF-EUR_MAF
NS                 2               1               1               1
S                  0               0               0               0
NS/S             Inf             Inf             Inf             Inf
     AFR_MAF-SAS_MAF EUR_MAF-SAS_MAF
NS                 1               1
S                  0               0
NS/S             Inf             Inf

Upvotes: 2

KenHBS
KenHBS

Reputation: 7164

You are trying to make R use column1 as input and then move over to column2, and so on.

inputs <- combn(c('EAS_MAF', 'AMR_MAF', 'AFR_MAF', 'EUR_MAF', 'SAS_MAF'),2)
output <- Map(sharedCalc.func, inputs[1, ], inputs[2, ])

Map will take the first value of inputs[1, ] and inputs[2, ] as the arguments to the first time sharedCalc.func is called and saves the output in a list. Then proceeds to the second values, etc. until all values have been used. So output is now a list that contains 10 named sublists.

Note: Something seems to be wrong with your function, because it doesn't produce what it is supposed to produce. The same output is produced when I call sharedCalc.func("EAS_MAF", "AMR_MAF")

output[[1]]
#    $NS
# [1] 1
# $S
# [1] 0
# $`NS/S`
# [1] Inf

Upvotes: 2

Related Questions