Bruno Avila
Bruno Avila

Reputation: 296

Import many files using loop

I have several in CSV format and I need to import them and transform them into DF using “FOR”. Name of my files:

FILE1.CSV; FILE2.CSV; FILE3.CSV

#FILE1
NAME<- c("JOHN","DONALD","CARL")
PRICE <- c(50, 60, 70)
FILE1 <- data.frame(NAME,PRICE)

#FILE2
NAME<- c("MICHAEL","CRIS","MARY")
PRICE <- c(12, 33, 78)
CITY<- c("NY", "LA","LON")
FILE2 <- data.frame(NAME,PRICE,CITY)

#FILE3
NAME<- c("PAUL","BROWN","WAL")
PRICE <- c(99, 54, 22)
CITY<- c("PAR","RIO","LIS")
POP<- c(150,369,871)
FILE3 <- data.frame(NAME,PRICE,CITY,POP)

Before turning them into DF I want to treat each file. Suppose that the import, treatment and transformation in DF has this sequence:

#PART 1
require(tidyverse)
setwd("D:/")

#PART 2
list_file <- list.files(pattern = "*.csv") %>% lapply(read.csv, sep=";")

I have an error here, because only the first file (FILE1) is transformed into DF and the rest are not transformed. I don't know how to fix it.

# PART 3
for (i in 1:seq_along(list_file)){
  DF<- as_tibble(list_file[[i]]) %>% select(NAME,PRICE) # Only the variables “NAME” and “PRICE” will be used.
}

From here I want to import FILE2, treat it and add it to the existing DF (rbind). And so on until my last file (FILE3). Therefore, as I would do to: 1) import; 2) handle the file and 3) add to a DF?

Upvotes: 1

Views: 74

Answers (1)

akrun
akrun

Reputation: 886938

In the loop, we need to store it in a list as 'DF' is getting updated in each iteration

lst1 <- vector('list', length(list_file))
for (i in seq_along(list_file)){
  lst1[[i]] <- as_tibble(list_file[[i]]) %>% 
         select(NAME,PRICE) 
  }

From the list, we can use bind_rows from dplyr to bind them rbind won't work if the column names are not matching or have extra columns in one of the list element and not in the other

bind_rows(lst1)

Also, as we are using tidyverse functions

library(dplyr)
library(tidyr)
library(purrr)
library(readr)
out <- map_dfr(list.files(pattern = "*\\.csv"), ~
          read_csv(.x) %>%
             select(NAME, PRICE)
 )

Or if we use fread, there is an option to select the columns of interest

library(data.table)
rbindlist(lapply(list.files(pattern = "*\\.csv"), fread,
         select = c("NAME", "PRICE")))

Upvotes: 2

Related Questions