Combine and summaries multiple CSV from one folder in R

Question

I am trying to combine multiple CSV files from one folder on the computer (over 200 CSVs) into one new file with all of the relevant data that is needed using R. Below is an example of two of the current CSV files (example data can be found here: https://drive.google.com/drive/folders/1RbQ_cMNX5ys53OMISataTy-ZZlfk1SC8?usp=sharing):

I am trying to create a new CSV file that has the site name, number, number of records, start year, end year, and the number of years between the start and end year. It would look like this with a new row for each of the original CSVs in the folder:

Paul · Accepted Answer

Assuming your CSVs are in a folder ./data, get a vector of the file names.

library(tidyverse)

filenames <- list.files("./data", "*.csv", full.names = TRUE)

Create a function that reads in the data. Read in the first line to get the name. Then read in the rest of the data, skipping the first two rows.

process_csv <- function(filename) {
  name <- read_csv(filename, n_max = 1, col_names = FALSE)[[2]]
  
  number <- read_csv(filename, n_max = 1, col_names = FALSE, skip = 1)[[2]]
  
  read_csv(filename, skip = 2) %>%
    mutate(year = lubridate::year(lubridate::dmy_hm(DATE))) %>%
    summarise(
      count = n(),
      start_year = min(year),
      end_year = max(year),
      year_count = end_year - start_year
    ) %>%
    mutate(
      name = name,
      number = number,
      .before = 1
    )
}

Apply the function and save the output.

result <- map_dfr(filenames, process_csv)

write_csv(result, "result.csv")

result
#> # A tibble: 2 x 6
#>   name        number count start_year end_year year_count
#>                            
#> 1 NEW_PLACE_1    123   281       1972     2020         48
#> 2 NEW_PLACE_2    155   393       1961     2020         59

Combine and summaries multiple CSV from one folder in R

Answers (1)

Related Questions