Reputation: 357
I have multiple textfiles (around 60) that I merge into a single file. I am looking for way of only adding the first 4 digits of the file name in a variable for each file. An example of a file name is 1111_2222_3333.txt
.
So basically I need an additional variable that includes the first 4 digits of the file name per file.
I did find the following related topics, but this does not allow me to include the 4 four digits only: How can I turn the filename into a variable when reading multiple csvs into R
R: Read multiple files and label them based on the file name
My code that does not include the file name yet is currently:
files <- list.files("pathname", pattern="*.TXT")
masterfilesales <- do.call(rbind, lapply(files, read.table))
Upvotes: 3
Views: 2314
Reputation: 83275
Update: Although the initial answer is correct, the same goal can be achieved in fewer steps by using sapply
with simplify=FALSE
instead of lapply
because sapply
automatically assigns the filenames to the elements in the list:
library(data.table)
files <- list.files("pathname", pattern="*.TXT")
file.list <- sapply(files, read.table, simplify=FALSE)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]
Old answer: To achieve what you want, you can utilize a combination of the setattr
function and the idcol
pararmeter of the rbindlist
function from the data.table
-package as follows:
library(data.table)
files <- list.files("pathname", pattern="*.TXT")
file.list <- lapply(files, read.table)
setattr(file.list, "names", files)
masterfilesales <- rbindlist(file.list, idcol="id")[, id := substr(id,1,4)]
Alternatively, you can set the filenames in base R with:
attr(file.list, "names") <- files
or:
names(file.list) <- files
and bind them together with bind_rows
from the dplyr
package (which has also an .id
parameter to create an id-column):
masterfilesales <- bind_rows(file.list, .id="id") %>% mutate(id = substr(id,1,4))
Upvotes: 4
Reputation: 3162
Are you looking for something like this?
c("1111_444.txt", "443343iqueh.txt") -> a
substring(a, first=1, last=4)
Upvotes: 0