Reputation: 13
I am working with dN/dS ratios (biology, not important to the question) and have ended up with some artifacts in my data (anything greater than 3 in a particular column is likely unreliable or an error) and I need to remove those artifacts before I make a histogram.
I am working with an imported xlxs file. One column in it contains the applicable data.
I have tried the following code
library(data.table)
outlierReplace = function(dataframe, cols, rows, newValue = NA) {
if (any(rows)) {
set(dataframe, rows, cols, newValue)
}
}
outlierReplace(X23k_Genome_dNdS_For_R,
`manual dN/dS`,
which(X23k_Genome_dNdS_For_R$`manual dN/dS` > 3),
NA)
This returned error codes (as follows)
Error in set(dataframe, rows, cols, newValue) :
Can't assign to the same column twice in the same query (duplicates detected).
In addition: Warning message:
In set(dataframe, rows, cols, newValue) :
Coerced j from numeric to integer. Please pass integer for efficiency; e.g., 2L rather than 2
To emphasize, I have 23k rows, 7 columns. I am trying to replace all values in the column "manual dN/dS" that are above 3 with NA's
You may need to install data.table to use the set() function
Sample data
dat = data.table("seq1"=c("CAA_0000006-RA", "CAA_0000007-RA"),
"seq2"=c("CAB_00000010-RA", "CAB_00000011-RA"),
"dN/dS"=c(0.4689, 0.1001), "dN"=c(0.0074, 0.0021),
"dS"=c(0.0169,0.0206),
"manual dN/dS"=c(0.4379,0.1019),
"man. dN/dS w/Nas"=c(0.437869822,0.101941748))
Upvotes: 1
Views: 131
Reputation: 8770
library(data.table)
setDT(dat)
dat[`manual dN/dS` > 3, `manual dN/dS` := NA]
Please note that your example data does not contain the column you mentioned in your question.
Please do also note that spaces and special characters like slashes in column names are bad practice since you always have to "quote" the names in your R code.
You can rename the column name eg. via data.table::setnames(data, "old name", "new name")
(see help for this function)
Upvotes: 1