SHILLATE
SHILLATE

Reputation: 43

How do I download and combine multiple files in r?

I am endeavoring to combine files but find myself writing very redundant code, which is cumbersome. I have looked at the documentation but for some reason cannot find anything about how to do this.

Basically, I download the code from my native machine, and then want to combine the exact same columns for each file (the only difference is year).

Can you help?

The issue is that this seems inefficient. Is there a more efficient way?

Upvotes: 1

Views: 661

Answers (2)

DuckPyjamas
DuckPyjamas

Reputation: 1659

At the risk of seeming icky for self-promotion, I wrote a function that does exactly this (desiderata::apply_to_files()):

# Apply a function to every file in a folder that matches a regex pattern

rain <- apply_to_files(path = "Raw data/Rainfall", pattern = "csv", 
                       func = readr::read_csv, col_types = "Tiic", 
                       recursive = FALSE, ignorecase = TRUE, 
                       method = "row_bind")

dplyr::sample_n(rain, 5)

#> # A tibble: 5 x 5
#> 
#>   orig_source_file       Time                 Tips    mV Event 
#>   <chr>                  <dttm>              <int> <int> <chr> 
#> 1 BOW-BM-2016-01-15.csv  2015-12-17 03:58:00     0  4047 Normal
#> 2 BOW-BM-2016-01-15.csv  2016-01-03 00:27:00     2  3962 Normal
#> 3 BOW-BM-2016-01-15.csv  2015-11-27 12:06:00     0  4262 Normal
#> 4 BIL-BPA-2018-01-24.csv 2015-11-15 10:00:00     0  4378 Normal
#> 5 BOW-BM-2016-08-05.csv  2016-04-13 19:00:00     0  4447 Normal

In this case, all of the files have identical columns and order (Time, Tips, mV, Event), so I can just method = "row_bind" and the function will automatically add the filename as an extra column. There are other methods available:

"full_join" (the default) returns all columns and rows. "left_join" returns all rows from the first file, and all columns from subsequent files. "inner_join" returns rows from the first file that have matches in subsequent files.

Internally, the function builds a list of files in the path (recursive or not), runs an lapply() on the list, and then handles merging the new list of dataframes into a single dataframe:

apply_to_files <- function(path, pattern, func, ..., recursive = FALSE, ignorecase = TRUE, 
                          method = "full_join") {
    file_list <- list.files(path = path,
                            pattern = pattern,
                            full.names = TRUE,      # Return full relative path.
                            recursive = recursive,  # Search into subfolders.
                            ignore.case = ignorecase)

    df_list <- lapply(file_list, func, ...)

    # The .id arg of bind_rows() uses the names to create the ID column.
    names(df_list) <- basename(file_list)

    out <- switch(method,
                  "full_join"  = plyr::join_all(df_list, type = "full"),
                  "left_join"  = plyr::join_all(df_list, type = "left"),
                  "inner_join" = plyr::join_all(df_list, type = "inner"),
                  # The fancy joins don't have orig_source_file because the values were
                  # getting all mixed together.
                  "row_bind"   = dplyr::bind_rows(df_list, .id = "orig_source_file"))

    return(invisible(out))
}

Upvotes: 1

NColl
NColl

Reputation: 757

You can make a list of the .csv files in the folder and then read them all together into a single df with purrr::map_df. You can add a column to differentiate between files then

library(tidyverse)

df <- list.files(path="C://SAM", 
                  pattern="*.csv") %>%
  purrr::map_df(function(x) readr::read_csv(x) %>%
  mutate(filename=gsub(" .csv", "", basename(x)))

Upvotes: 1

Related Questions