Reputation: 101
I have a list of 78 data frames (list_of_df
) that all have the same first column with all annotated ensembl transcript id:s, however they have the extension ".1", i e ("ENST00000448914.1" and so on) and I would like to remove that in order to match them against pure ENST-IDs.
I have tried to use lapply with a sapply inside like this:
lapply(list_of_df, function(x)
cbind(x,sapply(x$target_id, function(y) unlist(strsplit(y,split=".",fixed=T))[1])) )
but it takes forever, does anyone have a better idea of how to possibly do it?
Upvotes: 2
Views: 92
Reputation: 83215
You could simplify your code to:
lapply(list_of_df, function(x) x[,1] = unlist(strsplit(x[,1], split=".", fixed=TRUE))[1])
If your columns have factor as class, you can wrap x[,1]
in as.character
:
lapply(list_of_df, function(x) x[,1] = unlist(strsplit(as.character(x[,1]), split=".", fixed=TRUE))[1])
You could also make use of the stringi
package:
library(stringi)
lapply(list_of_df, function(x) x[,1] = stri_split_fixed(x[,1], ".", n=1, tokens_only=TRUE))
Upvotes: 1
Reputation: 886938
We loop through the list
of data.frames
, and use sub
to remove the .
followed by numbers in the first column.
lapply(list_of_df, function(x) {
x[,1] <-sub('\\.\\d+', '', x[,1])
x })
#[[1]]
# target_id value
#1 ENST000049 39
#2 ENST010393 42
#[[2]]
# target_id value
#1 ENST123434 423
#2 ENST00838 23
NOTE: Even if the OP's first column is factor
, this should work.
list_of_df <- list(data.frame(target_id= c("ENST000049.1",
"ENST010393.14"), value= c(39, 42), stringsAsFactors=FALSE),
data.frame(target_id=c("ENST123434.42", "ENST00838.22"),
value= c(423, 23), stringsAsFactors=FALSE))
Upvotes: 2