Reputation: 139
I am trying to combine multiple CSV files from one folder on the computer (over 200 CSVs) into one new file with all of the relevant data that is needed using R
. Below is an example of two of the current CSV files (example data can be found here: https://drive.google.com/drive/folders/1RbQ_cMNX5ys53OMISataTy-ZZlfk1SC8?usp=sharing):
I am trying to create a new CSV file that has the site name, number, number of records, start year, end year, and the number of years between the start and end year. It would look like this with a new row for each of the original CSVs in the folder:
Upvotes: 2
Views: 205
Reputation: 9107
Assuming your CSVs are in a folder ./data
, get a vector of the file names.
library(tidyverse)
filenames <- list.files("./data", "*.csv", full.names = TRUE)
Create a function that reads in the data. Read in the first line to get the name
. Then read in the rest of the data, skipping the first two rows.
process_csv <- function(filename) {
name <- read_csv(filename, n_max = 1, col_names = FALSE)[[2]]
number <- read_csv(filename, n_max = 1, col_names = FALSE, skip = 1)[[2]]
read_csv(filename, skip = 2) %>%
mutate(year = lubridate::year(lubridate::dmy_hm(DATE))) %>%
summarise(
count = n(),
start_year = min(year),
end_year = max(year),
year_count = end_year - start_year
) %>%
mutate(
name = name,
number = number,
.before = 1
)
}
Apply the function and save the output.
result <- map_dfr(filenames, process_csv)
write_csv(result, "result.csv")
result
#> # A tibble: 2 x 6
#> name number count start_year end_year year_count
#> <chr> <dbl> <int> <dbl> <dbl> <dbl>
#> 1 NEW_PLACE_1 123 281 1972 2020 48
#> 2 NEW_PLACE_2 155 393 1961 2020 59
Upvotes: 2