Konrad Weber
Konrad Weber

Reputation: 157

In R, read files from folder in a list and assign list element names by the file names w/o file format (.fa)

I´m making a list of fasta files and read them from a folder. The file name should be assigned as list element name w/o the .fa file format.

I´m using list.files to asses the files in the directory "Folder"

filenames <- list.files("Folder",pattern = ".fa",full.names = T)

and than read the fasta files in.

list <- lapply(filenames, FUN=readDNAStringSet, use.names=T, format="fasta")

I found this code using setNames to define the list element name.

list<- setNames(list, substr(list.files("Folder", pattern=".fa"), 1,15 ))

But my file names have different length (makes it difficult to use the START to STOP (,1, 15)) and for further processing I would like to get rid of the .fa

The files would look like:

Gene1.fa
Gene12.fa
Gene22a.fa
Gene123abc.fa

I´m using DECIPHER but I guess this is a more base R question?

Upvotes: 3

Views: 1298

Answers (1)

akrun
akrun

Reputation: 887098

Inorder to remove the substring at the end, we could use substr as well, but make sure to index the first/last from the end instead from the beginning as it is varying

v1 <- list.files("Folder", pattern=".fa")
substring(v1, first  = 1, last = nchar(v1) -3)
#[1] "Gene1"      "Gene12"     "Gene22a"    "Gene123abc"

Or another option is sub to match the dot (. - metacharacter that matches for any character, so escape (\\) it to get the literal meaning) followed by 'fa' at the end ($) of the string and replace it with blank ("")

sub("\\.fa$", "", v1)

Upvotes: 2

Related Questions