ac1998
ac1998

Reputation: 21

Import multiple CSV files into R and combine first two rows as the header column

I have about 50 CSV files that need to be imported into R, and then I want to set a header row which is the first two rows of data in each file. As an example:

DATE      FIRST  LAST ID
(M/D/Y)   NAME   NAME 

The data starts in Row 3. I can import all the files as their own unique dataframe which

temp = list.files(pattern="*.csv")
list2env(
  lapply(setNames(temp, make.names(gsub("*.csv$", "", temp))), 
         read.csv), envir = .GlobalEnv)

But how can I change that code to combine the first two rows as a header

Upvotes: 0

Views: 1031

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145775

I'd recommend reading in the first two rows, making the column names you want, and then read in the rest, skipping the first 2 rows. Here's an example for one file, found at filepath x:

cnames = read.csv(x, nrow = 2)
cnames = sapply(cnames, paste, collapse = "")
data = read.csv(x, skip = 2, col.names = cnames)

I'd also strongly encourage you to leave your results in a nice, easy-to-use list, rather than creating about 50 data frames in your environment. My reasoning for this recommendation is detailed here. That could look like this:

temp = list.files(pattern="*.csv")
data_list = list()
for(i in seq_along(temp)) {
  cnames = read.csv(x, nrow = 2)
  cnames = sapply(cnames, paste, collapse = "")
  data_list[[i]] = read.csv(x, skip = 2, col.names = cnames) 
}

names(data_list) = sub(".csv", "", basename(temp), fixed = TRUE)

# Then if you want to combine them all
complete_data = dplyr::bind_rows(data_list, .id = "source_file")

Upvotes: 1

Related Questions