piping multi-step data wrangling/transformation in a single call

Question

I am loading data in using the list.files function that concatenates multiple .csv files into a single data frame. However, I then have to run another line that finds and removes rows that meet a certain criteria. I then run another line that combines the date and time columns into a Date_time coulmn. What I'm working with is below:

#Identifies all csv files in a folder and combines them into one continuous data frame.     
exports_15E<- {list.files(path = "Z:/fishproj/Cambodia Dai project/Analytic/TS2020_Data_Exports",    
                           pattern = "^15E.*FullWaterColumn_100m.sv.csv", full.names = TRUE) %>% 
        lapply(read_csv) %>%                                            
        bind_rows}
#Removes zero values form the data frame
exports_15E<- exports_15E[exports_15E$Layer !=0,]
#Combines date and time columns into one column. 
exports_15E$Date_time <- as.POSIXct(paste(as.Date(as.character(exports_15E$Date_M),"%Y%m%d"), exports_15E$Time_M, sep=" "),format = "%Y-%m-%d %H:%M", tz="Asia/Bangkok")

What I'm trying to do is just combine those 3 functions or just write the syntax in a way that it will just run all of the above and create a single data frame out of it.

Example data frame:

     Layer Time_M  Date_M    Sv_mean
 1     1 18:22:22 20201201   -60.2
 2     1 19:22:23 20201201   -59.8
 3     1 20:22:23 20201201   -60.3
 4     1 21:22:23 20201201   -62.3
 5     1 22:22:23 20201201   -64.2
 6     0 00:00:00 00000000    9999
 7     0 00:00:00 00000000    9999
 8     0 00:00:00 00000000    9999
 9     0 00:00:00 00000000    9999
10     0 00:00:00 00000000    9999

Desired output:

         Layer Time_M  Date_M   Sv_mean       Date_time
     1     1 18:22:22 20201201   -60.2   2020-12-01 18:22:22 
     2     1 19:22:23 20201201   -59.8   2020-12-01 19:22:22 
     3     1 20:22:23 20201201   -60.3   2020-12-01 20:22:22 
     4     1 21:22:23 20201201   -62.3   2020-12-01 21:22:22 
     5     1 22:22:23 20201201   -64.2   2020-12-01 22:22:22

The code is meant to create this from my .csv files (which is simple enough), then I want the code to remove the rows with layers that read "0", and also combine the "Date_M" and "Time_M" columns into a "Date_time" column. The code I have works and does all this, I just want it to run all the strings at once.

GuedesBF · Accepted Answer

I would use a single pipe %>%chain. I think dplyr and purrr are most helpful here. Specially, these packages allow the awesome map_dfr() ("map-data.frame-rowwise") and strightforward filter() functions, which make data wrangling a fluid process. Something like this may work (hard to tell without a proper reproducible example).

library(dplyr)
library(purrr)

exports_15E<- list.files(path = "Z:/fishproj/Cambodia Dai project/Analytic/TS2020_Data_Exports",    
                           pattern = "^15E.*FullWaterColumn_100m.sv.csv", full.names = TRUE) %>% 
        map_dfr(read_csv) %>%
        filter(Layer !=0) %>%
        mutate(Date_time = as.POSIXct(paste(as.Date(as.character(Date_M),"%Y%m%d"), Time_M, sep=" "),format = "%Y-%m-%d %H:%M", tz="Asia/Bangkok"))

piping multi-step data wrangling/transformation in a single call

Answers (1)

Related Questions