Reputation: 657
I have 44 .csv files in my WD that I will eventually read into r and bind into one large file. Before I do that, I'd like to make some changes to each of the files. I want to:
I've found some information on gsub for problem 1) but not enough to get me where I want to be. As for 2), it seems that this should be quite simple, but I can't find any solution online.
Many thanks!
Upvotes: 0
Views: 948
Reputation: 3379
This may work to get you the output you are looking for.
# Set path to folder
folder.path <- getwd()
# Get list of csv files in folder
filenames <- list.files(folder.path, pattern = "*.csv", full.names = TRUE)
# Read all csv files in the folder and create a list of dataframes
ldf <- lapply(filenames, read.csv)
# Select the first 10 columns in each dataframe in the list
ldf <- lapply(ldf, subset, select = 1:10)
# Create a vector for the new column names
new.col.names <- c("col1","col2","col3","col4","col5","col6","col7","col8","col9","col10")
# Assign the new column names to each dataframe in the list
ldf <- lapply(ldf, setNames, new.col.names)
# Combine each dataframe in the list into a single dataframe
df.final <- do.call("rbind", ldf)
Upvotes: 1
Reputation: 1797
readLines
is your friend. Try import each one of them as separate vector e.g. my_csv<-readLines("path/to/your/csv")
then perform the modifications and finally save the output as follows:
out <- capture.output(my_csv)
cat(out, file="my_new.csv", sep="\n", append=F)
I would strongly recommend using data.table
package and in particular the fread()
function that allows fast import of csv's (as data.table objects) and then perform on them both the selection of 10 columns and the name alteration.
Of course via fwrite()
you can send their info back to csv at anytime.
and use only if the columns of every csv have the same position and name, in order to keep only the first 10 as you mentioned above
A combination of lapply
and data.table
can do miracles. In particular:
rbindlist(lapply(list.files("path/to/the/folder/with/csvs"),fread),use.names=TRUE, fill=FALSE)
will solve most of your data import issues.
Upvotes: 0