Mus
Mus

Reputation: 7530

How can I apply a function to a set of .csv files in a particular directory?

Despite lots of research and several efforts using lapply (I think/hope that's the correct apply function), I have been unable to achieve the following and would like some guidance. What I want to do is read in all files in a single directory, merge them all into a single dataframe, making sure that each file has the first seven rows deleted before the merge.

(Note that all files contain the same column headings and contain the same datatypes.)

I have tried this, but it clearly falls short of everything I want to achieve:

files <- list.files(pattern = "*.csv") # Gather a list of everything in the directory that is a .csv file.
aconex <- lapply(files, fread) # Use lapply (I think this is correct) to apply the fread() function (from the data.table package) to each .csv file

This results in everything being stored in a vector, whereas I want the output to be a data frame.

There has to be a better approach - I just can't seem to figure it out.

Can anybody suggest a better solution?

UPDATE:

Alternatively, I have written a for loop which partially achieves what I want; the problem is that it only saves a single file's worth of data to the data frame (there are 15 files in total):

for(x in list.files(pattern = "*.csv")){
  df <- data.table::fread(x)
  df <- df[-(1:7), ]
  colnames(df) <- as.character(unlist(df[1,]))
  df <- df[-(1), ]
}

Once the first seven rows have been removed, I then apply the first row as column names and then remove the first row. Again, what is a better way to achieve this?

Ideally, I want the resulting output to either be x-number of data frames (df1, df2, .., dfX) and I can then merge those, but, again, there has to be a better way - what is it?

Put simply, I want each file to be read into its own data frame, then for the value of row 8 to be used as the column headings, then the first eight rows removed (I only kept the eighth row in order to use it for the column headings before removing it).

Upvotes: 0

Views: 584

Answers (1)

Nick Criswell
Nick Criswell

Reputation: 1743

This can be done by creating an anonymous function that does the reading with read.csv and then removes the first seven rows with the skip argument. Then you can stick all the data.frames together with do.call.

files <- list.files(pattern = "*.csv")

#create f, which is a list of data frames
f <- lapply(files, function(m) df <- read.csv(m, skip = 7, header = TRUE))

#stick them all together with do.call-rbind
f_combine <- do.call("rbind", f)

If you do need the speed provided by data.table::fread, you could modify the code as follows:

#create f, which is a list of data frames; modified with fread from data.table
f <- lapply(files, function(m) df <- fread(m, skip = 7))

#use rbindlist this time
f_combine <- rbindlist(f )

Upvotes: 1

Related Questions