Reputation: 545
In dfs containing results of differentially expressed proteins, I would like to mark which proteins exceed certain thresholds of significance (eg logFC>1 & p<0.05 as up_0.05 or p<0.01 as up_0.01). Using ifelse I can do this for each df individually, but it would be much cleaner to have a function as I have many dfs to process this way.
A similar question has been asked (dplyr - mutate: use dynamic variable names) but I was not able to translate this into solving my problem, so I would appreciate it very much if you could correct my functions code to work (example data provided)
Thanks a lot!
p.vals <- seq(from=0, to=1, by=.0001)
logFCs <- seq(from=0, to=4, by=.1)
diffEx_proteins <- data.frame(protein=LETTERS[1:1000],
adj.P.Val=sample(p.vals, size=1000, replace=TRUE),
logFC=sample(logFCs, size=1000, replace=TRUE))
mark_significants <- function(comparison){
comparison$paste0(comparison, "up_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.05") <- ifelse(comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1, TRUE, FALSE)
comparison$paste0(comparison, "up_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1, TRUE, FALSE)
comparison$paste0(comparison, "down_0.01") <- ifelse(comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1, TRUE, FALSE)
}
mark_significants(diffEx_proteins)
I get the error "Error in mark_significants(diffEx_proteins) : invalid function in complex assignment"
I would like to get the df with 4 added logical columns, indicating wether proteins reach the defined threshold levels.
Upvotes: 0
Views: 83
Reputation: 545
Working but inelegant solution, separating dataframe and its name:
mark_significants_3 <- function(comparison, name){
comparison[,paste0(name, "_up_0.05")] <- comparison$adj.P.Val <= 0.05 &
comparison$logFC >= 1
comparison[,paste0(name, "_down_0.05")] <- comparison$adj.P.Val <= 0.05 & c
comparison$logFC <= -1
comparison[,paste0(name, "_up_0.001")] <- comparison$adj.P.Val <= 0.001 &
comparison$logFC >= 1
comparison[,paste0(name, "_down_0.001")] <- comparison$adj.P.Val <= 0.001 &
comparison$logFC <= -1
return(comparison)
}
test3 <- mark_significants_3(diffEx_proteins, "diffEx_proteins")
Upvotes: 0
Reputation: 2250
Several problems with the syntax that I will explain below. Here is the fixed function:
mark_significants <- function(comparison){
comparison[,"up_0.05"] <- comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1
comparison[,"down_0.05"] <- comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1
comparison[,"up_0.01"] <- comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1
comparison[,"down_0.01"] <- comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1
return(comparison)
}
test <- mark_significants(diffEx_proteins)
head(test, 3)
# protein adj.P.Val logFC up_0.05 down_0.05 up_0.01 down_0.01
#1 A 0.9612 1.4 FALSE FALSE FALSE FALSE
#2 B 0.8271 3.1 FALSE FALSE FALSE FALSE
#3 C 0.1829 2.5 FALSE FALSE FALSE FALSE
comparison
is a data.frame
and thus the function paste0
does not know what to paste. In essence, it collates character strings. I assume that you wanted to add a column, an in my edit, I use the squared brackets with a new name to. Unlike calling comparison$up_0.05
, which would also work here, adding a new column from within the squared brackets enables dynamic naming of the column - such as through the paste0
function.ifelse
function is not necessary if the result is TRUE/FALSE
and the comparison can be directly vectorized form the whole column.R
to output the result of the function through the return
. To directly modify the original data, you can use diffEx_proteins <- mark_significants(diffEx_proteins)
.Following additional information in the comment, this and this posts offer a solution. In short, the name of the data.frame
has to be extracted before the data enter the function, otherwise deparse(substitute())
returns the whole data.frame
. Here, the function will accept the name of the data.frame
as a character vector, get
the data from the name and paste
the name to column names of the result.
mark_significants <- function(comparison){
dat <- get(comparison)
dat[,paste(comparison,"up_0.05", sep = "_")] <- dat$adj.P.Val <= 0.05 & dat$logFC >= 1
dat[,paste(comparison,"down_0.05", sep = "_")] <- dat$adj.P.Val <= 0.05 & dat$logFC <= -1
dat[,paste(comparison,"up_0.01", sep = "_")] <- dat$adj.P.Val <= 0.01 & dat$logFC >= 1
dat[,paste(comparison,"down_0.01", sep = "_")] <- dat$adj.P.Val <= 0.01 & dat$logFC <= -1
return(dat)
}
test1 <- mark_significants(deparse(substitute(diffEx_proteins)))
test2 <- mark_significants("diffEx_proteins")
identical(test1, test2)
# [1] TRUE
Upvotes: 2
Reputation: 545
Thank you very much nya, that brought me on the right track to the solution! Only I wanted to add the name of the "comparison" to the new columns as I am using the colnames later on for a VENN diagram.
Here is my modified version of your function that includes the "comparison" into the colnames (your hint about comparison being a dataframe helped to solve its correct usage)
mark_significants_2 <- function(comparison){
comparison[,paste0("comparison","_up_0.05")] <- comparison$adj.P.Val <= 0.05 & comparison$logFC >= 1
comparison[,paste0("comparison","_down_0.05")] <- comparison$adj.P.Val <= 0.05 & comparison$logFC <= -1
comparison[,paste0("comparison","_up_0.01")] <- comparison$adj.P.Val <= 0.01 & comparison$logFC >= 1
comparison[,paste0("comparison","_down_0.01")] <- comparison$adj.P.Val <= 0.01 & comparison$logFC <= -1
return(comparison)
}
Slowly im getting into writing functions, your hints were great to understand the syntax issues!
Upvotes: 0