Robin
Robin

Reputation: 459

R: how to read multiple csv files with column name in row n and select certain columns from the file and add file name to the file as a new column?

I have 100 csv files in the same folder, let's say the path="D:\Data".

For each file I want to:

Step 1. read the file from row 12 since the column names are at row 12;

Step 2. select certain columns from the file, let's say the colname I want to keep are "Date","Time","Value";

Step 3. add the file name to the file as a new column, for example, I want to save file1 of which name is "example 1.csv" as file1$Name="example 1.csv", and similarly, save file2 of which name is "example 2.csv" as file2$Name="example 2.csv", etc...

So far we got 100 new files with 4 columns "Date","Time","Value","Name". Then finally rbind all the 100 new files together.

I have no idea how to code these steps all together in R. So anyone can help? Thanks very much for your time.


update

Due the complicated data structure in my data, it always return errors by using the sample code in answers. The ideas behind the code were correct, but somehow I could only solve the problem by using the code as below. I believe there would be more elegant way to modify my code instead of using loop.

# set up working directory
setwd("D:/Data")

library(data.table) 
files <- list.files(path ="D:/Data", pattern = ".csv")

# read and save each file as a list of data frame in temp
temp <- lapply(files, read.csv, header = TRUE, skip=11, sep = "\t", fileEncoding="utf-16")

seq_along(temp) # the number of files is 112

## select columns "Date","Time","Value" as a new file, 
## and attach the file name as a new column to each new file, 
## and finally row bind all the files together 

temp2=NULL
for(i in 1:112) {
  dd=cbind(File=files[i],temp[[i]][,c("Date","Time","Value")])
  temp2=rbind(temp2,dd)
}

Upvotes: 0

Views: 1253

Answers (2)

Julian
Julian

Reputation: 9260

You could try something like this

list_of_files <- list.files(path <- "D:/Data/", pattern="*.csv", full.names=TRUE)

library(dplyr)
library(purrr)
list_of_files %>%
  set_names() %>%
  map_dfr(~ .x %>%
            readr::read_csv(.,
                     skip = 12, 
                     col_names = TRUE
            ) %>% 
            select(Date, Time, Value) %>% 
  mutate(Date = as.character(Date)) %>% 
# Alternatively you could use the .id argument in map_dfr for the filename
            mutate(filename = match(.x, list_of_files)))

Upvotes: 2

Richard Berry
Richard Berry

Reputation: 396

You can do this very neatly with vroom. It can take a list of files as an argument rather than having to do each separately, and add the filename column itself:

library(vroom)

vroom(files, skip = 11, id = 'filename', col_select = c(Date, Time, Value, filename))

Upvotes: 2

Related Questions