Sebastian Hesse
Sebastian Hesse

Reputation: 545

Creating new, variable-name dependent columns in a function (to indicate levels of significance in expression data)

In dfs containing results of differentially expressed proteins, I would like to mark which proteins exceed certain thresholds of significance (eg logFC>1 & p<0.05 as up_0.05 or p<0.01 as up_0.01). Using ifelse I can do this for each df individually, but it would be much cleaner to have a function as I have many dfs to process this way.

A similar question has been asked (dplyr - mutate: use dynamic variable names) but I was not able to translate this into solving my problem, so I would appreciate it very much if you could correct my functions code to work (example data provided)

Thanks a lot!

sample data

p.vals <- seq(from=0, to=1, by=.0001)
logFCs <- seq(from=0, to=4, by=.1)


diffEx_proteins <- data.frame(protein=LETTERS[1:1000],
                          adj.P.Val=sample(p.vals, size=1000, replace=TRUE),
                          logFC=sample(logFCs, size=1000, replace=TRUE))

function

mark_significants <- function(comparison){
comparison$paste0(comparison, "up_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1, TRUE, FALSE)
comparison$paste0(comparison, "up_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1, TRUE, FALSE)
}

usage

mark_significants(diffEx_proteins)

I get the error "Error in mark_significants(diffEx_proteins) : invalid function in complex assignment"

I would like to get the df with 4 added logical columns, indicating wether proteins reach the defined threshold levels.

Upvotes: 0

Views: 83

Answers (3)

Sebastian Hesse
Sebastian Hesse

Reputation: 545

Working but inelegant solution, separating dataframe and its name:

mark_significants_3 <- function(comparison, name){
 comparison[,paste0(name, "_up_0.05")] <- comparison$adj.P.Val <= 0.05 & 
comparison$logFC >= 1
 comparison[,paste0(name, "_down_0.05")] <- comparison$adj.P.Val <= 0.05 & c 
comparison$logFC <= -1
 comparison[,paste0(name, "_up_0.001")] <- comparison$adj.P.Val <= 0.001 & 
comparison$logFC >= 1
 comparison[,paste0(name, "_down_0.001")] <- comparison$adj.P.Val <= 0.001 & 
comparison$logFC <= -1
 return(comparison)
 } 

test3 <- mark_significants_3(diffEx_proteins, "diffEx_proteins")

Upvotes: 0

nya
nya

Reputation: 2250

Several problems with the syntax that I will explain below. Here is the fixed function:

mark_significants <- function(comparison){
    comparison[,"up_0.05"] <- comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1
    comparison[,"down_0.05"] <- comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1
    comparison[,"up_0.01"] <- comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1
    comparison[,"down_0.01"] <- comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1
    return(comparison)
}

test <- mark_significants(diffEx_proteins)
head(test, 3)
#  protein adj.P.Val logFC up_0.05 down_0.05 up_0.01 down_0.01
#1       A    0.9612   1.4   FALSE     FALSE   FALSE     FALSE
#2       B    0.8271   3.1   FALSE     FALSE   FALSE     FALSE
#3       C    0.1829   2.5   FALSE     FALSE   FALSE     FALSE
  1. comparison is a data.frame and thus the function paste0 does not know what to paste. In essence, it collates character strings. I assume that you wanted to add a column, an in my edit, I use the squared brackets with a new name to. Unlike calling comparison$up_0.05, which would also work here, adding a new column from within the squared brackets enables dynamic naming of the column - such as through the paste0 function.
  2. The ifelse function is not necessary if the result is TRUE/FALSE and the comparison can be directly vectorized form the whole column.
  3. Lastly, the variables modified within a function are not changed outside of it, unless specified. Therefore, we have to tell R to output the result of the function through the return. To directly modify the original data, you can use diffEx_proteins <- mark_significants(diffEx_proteins).

Edit

Following additional information in the comment, this and this posts offer a solution. In short, the name of the data.frame has to be extracted before the data enter the function, otherwise deparse(substitute()) returns the whole data.frame. Here, the function will accept the name of the data.frame as a character vector, get the data from the name and paste the name to column names of the result.

mark_significants <- function(comparison){
    dat <- get(comparison)
    dat[,paste(comparison,"up_0.05", sep = "_")] <- dat$adj.P.Val <= 0.05 & dat$logFC >= 1
    dat[,paste(comparison,"down_0.05", sep = "_")] <- dat$adj.P.Val <= 0.05 & dat$logFC <= -1
    dat[,paste(comparison,"up_0.01", sep = "_")] <- dat$adj.P.Val <= 0.01 & dat$logFC >= 1
    dat[,paste(comparison,"down_0.01", sep = "_")] <- dat$adj.P.Val <= 0.01 & dat$logFC <= -1
    return(dat)
}

test1 <- mark_significants(deparse(substitute(diffEx_proteins)))
test2 <- mark_significants("diffEx_proteins")
identical(test1, test2)
# [1] TRUE

Upvotes: 2

Sebastian Hesse
Sebastian Hesse

Reputation: 545

Thank you very much nya, that brought me on the right track to the solution! Only I wanted to add the name of the "comparison" to the new columns as I am using the colnames later on for a VENN diagram.

Here is my modified version of your function that includes the "comparison" into the colnames (your hint about comparison being a dataframe helped to solve its correct usage)

mark_significants_2 <- function(comparison){
 comparison[,paste0("comparison","_up_0.05")] <- comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1
 comparison[,paste0("comparison","_down_0.05")] <- comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1
 comparison[,paste0("comparison","_up_0.01")] <- comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1
 comparison[,paste0("comparison","_down_0.01")] <- comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1
 return(comparison)
}

Slowly im getting into writing functions, your hints were great to understand the syntax issues!

Upvotes: 0

Related Questions