Reputation: 7530
Despite lots of research and several efforts using lapply
(I think/hope that's the correct apply
function), I have been unable to achieve the following and would like some guidance. What I want to do is read in all files in a single directory, merge them all into a single dataframe, making sure that each file has the first seven rows deleted before the merge.
(Note that all files contain the same column headings and contain the same datatypes.)
I have tried this, but it clearly falls short of everything I want to achieve:
files <- list.files(pattern = "*.csv") # Gather a list of everything in the directory that is a .csv file.
aconex <- lapply(files, fread) # Use lapply (I think this is correct) to apply the fread() function (from the data.table package) to each .csv file
This results in everything being stored in a vector, whereas I want the output to be a data frame.
There has to be a better approach - I just can't seem to figure it out.
Can anybody suggest a better solution?
UPDATE:
Alternatively, I have written a for
loop which partially achieves what I want; the problem is that it only saves a single file's worth of data to the data frame (there are 15 files in total):
for(x in list.files(pattern = "*.csv")){
df <- data.table::fread(x)
df <- df[-(1:7), ]
colnames(df) <- as.character(unlist(df[1,]))
df <- df[-(1), ]
}
Once the first seven rows have been removed, I then apply the first row as column names and then remove the first row. Again, what is a better way to achieve this?
Ideally, I want the resulting output to either be x-number of data frames (df1
, df2
, .., dfX
) and I can then merge those, but, again, there has to be a better way - what is it?
Put simply, I want each file to be read into its own data frame, then for the value of row 8
to be used as the column headings, then the first eight rows removed (I only kept the eighth row in order to use it for the column headings before removing it).
Upvotes: 0
Views: 584
Reputation: 1743
This can be done by creating an anonymous function that does the reading with read.csv
and then removes the first seven rows with the skip
argument. Then you can stick all the data.frame
s together with do.call
.
files <- list.files(pattern = "*.csv")
#create f, which is a list of data frames
f <- lapply(files, function(m) df <- read.csv(m, skip = 7, header = TRUE))
#stick them all together with do.call-rbind
f_combine <- do.call("rbind", f)
If you do need the speed provided by data.table::fread
, you could modify the code as follows:
#create f, which is a list of data frames; modified with fread from data.table
f <- lapply(files, function(m) df <- fread(m, skip = 7))
#use rbindlist this time
f_combine <- rbindlist(f )
Upvotes: 1