Komal Rathi
Komal Rathi

Reputation: 4274

Convert plyr::ddply to dplyr

I have a dataframe like this:

tmp <- read.table(header = T, text = "gene_id   gene_symbol ensembl_id  keep val1   val2    val3
x   a   Multiple    Yes 1   2   3
                  x1    a   Multiple    No  2   3   4
                  x2    a   Multiple    No  1   4   3
                  y b   Multiple    Yes 22  20  12
                  y1    b   Multiple    No  98  7   97
                  y2    b   Multiple    No  8   76  6")

I am trying to group by the gene_symbol variable and calculating correlation between each row that is keep == "Yes" with all other rows (keep == "No") and returning an average correlation along with the gene_symbol and gene_id. This is the function:

# function to calculate avg. correlation
calc.mean.corr <- function(x){
  gene.id <- x[which(x$keep == "Yes"),"gene_id"]
  x1 <- x %>% 
    filter(keep == "Yes") %>%
    select(-c(gene_id, gene_symbol, ensembl_id, keep)) %>%
    as.numeric()
  x2 <- x %>% 
    filter(keep == "No") %>%
    select(-c(gene_id, gene_symbol, ensembl_id, keep))

  # correlation of kept id with discarded ids
  cor <- mean(apply(x2, 1, FUN = function(y) cor(x1, y)))
  cor <- round(cor, digits = 2)
  df <- data.frame(avg.cor = cor, gene_id = gene.id)
  return(df)
}

# call using ddply
for.corr <- plyr::ddply(tmp, .variables = "gene_symbol", .fun = function(x) calc.mean.corr(x))

The final output looks like this:

> for.corr
  gene_symbol avg.cor gene_id
1           a    0.83       x
2           b    0.02       y

I am using plyr::ddply for this but want to use dplyr instead. However, I am not sure how to convert it to dplyr format. Any help would be much appreciated.

Upvotes: 1

Views: 213

Answers (1)

akrun
akrun

Reputation: 887048

If we don't want to change the function, one option it to do a group_split and apply the function

library(dplyr)
library(purrr)
tmp %>%
   group_split(gene_symbol) %>%
   map_dfr(calc.mean.corr)

To include the gene_symbol

tmp %>%
    split(.$gene_symbol) %>%
    map_dfr(~ calc.mean.corr(.), .id = 'gene_symbol')
#    gene_symbol avg.cor gene_id
#1           a    0.83       x
#2           b    0.02       y

Upvotes: 2

Related Questions