Reputation: 15
I have hundreds of daily weather data with a .txt extension, with comma (",") as separators in common folders. Each file has the same data structure with different file names. The following is an example of the data structure:
$ year : int 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 ...
$ month : int 1 1 1 1 1 1 1 1 1 1 ...
$ day : int 1 2 3 4 5 6 7 8 9 10 ...
$ V1 : num 18.4 22.9 19.9 22.9 23.4 9.8 13.9 17.5 20.3 22.7 ...
$ V2 : num 30.8 31.5 31.4 31.3 31.5 29.8 30.1 30.6 30.5 31.1 ...
$ V3 : num 23.4 23.7 23.2 23.3 23.4 22.9 23 23.4 23.1 23.2 ...
$ V4 : num 2.2 0 0 0 0.9 3.6 3.5 3.7 1.2 0 ...
$ V5 : num 0.93 0.86 0.88 0.87 0.87 0.98 1 0.96 0.96 0.91 ...
$ V6 : num 1.6 3.5 5.2 5.5 3.9 4.2 4.2 4.9 4.9 4.4 ...
I need to make a summary of the total monthly of one of the variables, (let say V4) from each file. And the desired output data structure of each file is like this (first column is year, the second column is month, the third column is total of the daily value of V4):
Year 1 Month 1 22.1
Year 1 Month 2 82.4
Year 1 Month 3 142.8
Year 1 Month …etc 314
Year 2 Month 1 48.9
Year 2 Month 2 173.6
Year 2 Month 3 76.2
Year 2 Month …etc 517.4
Year 3 Month 1 117.8
Year 3 Month 2 20.1
Year 3 Month 3 169.8
Year 3 Month …etc 191.5
Then i need export the results to be a unique .txt file from all files, with the name of the new file according to the original file (example: before_file1.txt into result_file1.txt) for each file. I have a script using Purrr, but nothing seems to be happening. Please if you were willing to help me improve the script with the right method. Thank you
# Load packages
library(tidyverse)
library(dplyr)
library(purrr)
# Setting working directory
workingdirectory <- "D:/Directory"
setwd(workingdirectory)
# Listing the files in the folder with .txt extension
FilesList <- list.files(workingdirectory, pattern = "\\.txt$", full.names = TRUE)
# Looping per files
purrr::map(FilesList, ~{
.x %>%
# Read csv file
read.csv(sep = ",", header = FALSE, stringsAsFactors = FALSE) %>%
# select variables
variables <- c("year", "month", "day", "V4") %>%
# summarize monthly of V4
group_by(month, year) %>%
summarise(monthly = sum(V4)) %>%
})
# Write the data back
write.csv(paste0('Result_', basename(.x)), sep = ",", row.names = FALSE)
I have edited the script, but there is an error. Please help to fix it. Thanks
Error: unexpected '}' in:
"
}"
>
> # Write the data back
> write.csv(paste0('TM_', basename(.x)), sep = ",", row.names = FALSE)
Error in basename(.x) : object '.x' not found
In addition: Warning message:
In write.csv(paste0("TM_", basename(.x)), sep = ",", row.names = FALSE) :
attempt to set 'sep' ignored
Upvotes: 0
Views: 379
Reputation: 73
I think you're already in the right direction. My suggested workaround to this would be to define the function prior to running the purrr::map function.
Therefore, the code should look something like this:
# Load packages
library(tidyverse)
library(dplyr)
library(purrr)
# Setting working directory
workingdirectory <- "D:/Directory"
setwd(workingdirectory)
# Listing the files in the folder with .txt extension
FilesList <- list.files(workingdirectory, pattern = "\\.txt$", full.names = TRUE)
columnNames <- c("year", "month", "day", "pcp_day")
# define function
processing <- function(x){
x %>% read.csv(sep = "", header = FALSE, stringsAsFactors = FALSE) %>% rename_at(c(1,2,3,7), ~columnNames) %>% filter(month != 2 | day != 29) %>% group_by(month, year) %>% summarise(monthly = sum(pcp_day))
}
# Looping per files and # Write the data back
purrr::map(FilesList, ~processing(.x) %>% write.csv(paste0('Result_', basename(.x)), row.names = FALSE))
If run successfully, you can find the outputs in the working directory you work in.
Upvotes: 1