Reputation: 21
I have about 50 CSV files that need to be imported into R, and then I want to set a header row which is the first two rows of data in each file. As an example:
DATE FIRST LAST ID
(M/D/Y) NAME NAME
The data starts in Row 3. I can import all the files as their own unique dataframe which
temp = list.files(pattern="*.csv")
list2env(
lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))),
read.csv), envir = .GlobalEnv)
But how can I change that code to combine the first two rows as a header
Upvotes: 0
Views: 1031
Reputation: 145775
I'd recommend reading in the first two rows, making the column names you want, and then read in the rest, skipping the first 2 rows. Here's an example for one file, found at filepath x
:
cnames = read.csv(x, nrow = 2)
cnames = sapply(cnames, paste, collapse = "")
data = read.csv(x, skip = 2, col.names = cnames)
I'd also strongly encourage you to leave your results in a nice, easy-to-use list
, rather than creating about 50 data frames in your environment. My reasoning for this recommendation is detailed here. That could look like this:
temp = list.files(pattern="*.csv")
data_list = list()
for(i in seq_along(temp)) {
cnames = read.csv(x, nrow = 2)
cnames = sapply(cnames, paste, collapse = "")
data_list[[i]] = read.csv(x, skip = 2, col.names = cnames)
}
names(data_list) = sub(".csv", "", basename(temp), fixed = TRUE)
# Then if you want to combine them all
complete_data = dplyr::bind_rows(data_list, .id = "source_file")
Upvotes: 1