Arturo Sbr
Arturo Sbr

Reputation: 6323

Batch reading only certain variables of many files

I have one different CSV file for every month since 2010. They all have the same basic information, but recent files have more variables than the old ones. They are also arranged in different orders.

I want to batch read all the files in my directory as data.tables, select a list of variables from each read file and then compile them into a longitudinal data.table unsing rbind.

I thought of using something along the lines of:

all_files = list.files()
dbs <- lapply(all_files, data.table)

Then select a list of variables from each data.table:

object[ , c("ID", "Date", "Paid", "Pending")]
# These 4 variables have the exact same names throughout all files.

And finally, use rbind to bind all the data.tables together.

Thanks in advance!

Upvotes: 0

Views: 85

Answers (1)

Gautam
Gautam

Reputation: 2753

If the columns are named identically:

library(data.table) 
fnames <- dir(path = choose.dir(), pattern = '.*csv', full.names = T, recursive = F)
cols <- c('ID', 'Date', 'Paid', 'Pending')
data_list <- lapply(fnames, function(z){
  ret <- fread(z, select = cols)
  ret$filename <- z
  return(ret)
})

I stored filename as a column name because you may want to trace back where the data came from (for any discrepancies etc.).

To get it into a single data.table object:

dat <- rbindlist(dat_list, use.names = T, fill = T)

Upvotes: 2

Related Questions