research111
research111

Reputation: 357

How can I turn a part of the filename into a variable when reading multiple text files?

I have multiple textfiles (around 60) that I merge into a single file. I am looking for way of only adding the first 4 digits of the file name in a variable for each file. An example of a file name is 1111_2222_3333.txt.

So basically I need an additional variable that includes the first 4 digits of the file name per file.

I did find the following related topics, but this does not allow me to include the 4 four digits only: How can I turn the filename into a variable when reading multiple csvs into R

R: Read multiple files and label them based on the file name

My code that does not include the file name yet is currently:

files <- list.files("pathname", pattern="*.TXT")
masterfilesales <- do.call(rbind, lapply(files, read.table))

Upvotes: 3

Views: 2314

Answers (2)

Jaap
Jaap

Reputation: 83275

Update: Although the initial answer is correct, the same goal can be achieved in fewer steps by using sapply with simplify=FALSE instead of lapply because sapply automatically assigns the filenames to the elements in the list:

library(data.table)

files <- list.files("pathname", pattern="*.TXT")
file.list <- sapply(files, read.table, simplify=FALSE)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]

Old answer: To achieve what you want, you can utilize a combination of the setattr function and the idcol pararmeter of the rbindlist function from the data.table-package as follows:

library(data.table)

files <- list.files("pathname", pattern="*.TXT")
file.list <- lapply(files, read.table)
setattr(file.list, "names", files)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]

Alternatively, you can set the filenames in base R with:

attr(file.list, "names") <- files

or:

names(file.list) <- files

and bind them together with bind_rows from the dplyr package (which has also an .id parameter to create an id-column):

masterfilesales <- bind_rows(file.list, .id="id") %>% mutate(id = substr(id,1,4))

Upvotes: 4

Marta
Marta

Reputation: 3162

Are you looking for something like this?

c("1111_444.txt", "443343iqueh.txt") -> a

substring(a, first=1, last=4)

Upvotes: 0

Related Questions