user3346285
user3346285

Reputation: 101

Change column in data frames in list

I have a list of 78 data frames (list_of_df) that all have the same first column with all annotated ensembl transcript id:s, however they have the extension ".1", i e ("ENST00000448914.1" and so on) and I would like to remove that in order to match them against pure ENST-IDs.

I have tried to use lapply with a sapply inside like this:

lapply(list_of_df, function(x)  
                 cbind(x,sapply(x$target_id, function(y) unlist(strsplit(y,split=".",fixed=T))[1])) ) 

but it takes forever, does anyone have a better idea of how to possibly do it?

Upvotes: 2

Views: 92

Answers (2)

Jaap
Jaap

Reputation: 83215

You could simplify your code to:

lapply(list_of_df, function(x) x[,1] = unlist(strsplit(x[,1], split=".", fixed=TRUE))[1])

If your columns have factor as class, you can wrap x[,1] in as.character:

lapply(list_of_df, function(x) x[,1] = unlist(strsplit(as.character(x[,1]), split=".", fixed=TRUE))[1])

You could also make use of the stringi package:

library(stringi)
lapply(list_of_df, function(x) x[,1] = stri_split_fixed(x[,1], ".", n=1, tokens_only=TRUE))

Upvotes: 1

akrun
akrun

Reputation: 886938

We loop through the list of data.frames, and use sub to remove the . followed by numbers in the first column.

lapply(list_of_df, function(x) {
          x[,1] <-sub('\\.\\d+', '', x[,1])
           x })

#[[1]]
#   target_id value
#1 ENST000049    39
#2 ENST010393    42

#[[2]]
#   target_id value
#1 ENST123434   423
#2  ENST00838    23

NOTE: Even if the OP's first column is factor, this should work.

data

list_of_df <- list(data.frame(target_id= c("ENST000049.1", 
   "ENST010393.14"), value= c(39, 42), stringsAsFactors=FALSE), 
  data.frame(target_id=c("ENST123434.42", "ENST00838.22"), 
   value= c(423, 23), stringsAsFactors=FALSE))

Upvotes: 2

Related Questions