Reputation: 25
I have incredibly raw data in the format of a .zip with a .txt file inside. For the most part, it cleanly reads in using read_csv, but there are some lines where the data is logging something else and completely skews the column structure. This data has no chance of being fixed.
When using read_csv
, it shows up as a parsing problem. I want to set up my code where if this problem appears in the data, the whole file is ignored. It'd be great if there was a log of which files were ignored/thrown out. I looked into possibly()
, but since it's not a full error with the file, only the lines, it doesn't skip the file.
This is my code at the moment.
library(dplyr)
library(readr)
library(purrr)
read_log <- function(path) {
read_csv(path, col_types = cols(.default = col_character())) %>%
mutate(filename = basename(path))
}
test_files <- file.path("example.txt") #would normally be list.files, simplified for this reprex
raw_data <- map_dfr(test_files, read_log)
#> Warning: 6 parsing failures.
#> row col expected actual file
#> 3 -- 17 columns 4 columns 'example.txt'
#> 4 -- 17 columns 23 columns 'example.txt'
#> 5 -- 17 columns 23 columns 'example.txt'
#> 6 -- 17 columns 23 columns 'example.txt'
#> 7 -- 17 columns 23 columns 'example.txt'
#> ... ... .......... .......... .............
#> See problems(...) for more details.
Upvotes: 0
Views: 777
Reputation: 388962
You can return NULL
if a warning is returned. Try using this function.
library(reader)
library(purrr)
library(dplyr)
read_log <- function(path) {
data <- tryCatch(read_csv(path,col_types = cols(.default = col_character())),
warning = function(e) return(NULL))
if(!is.null(data))
data <- data %>% mutate(filename = basename(path))
return(data)
}
Read the data with map
instead of map_dfr
:
all_data <- map(test_files, read_log)
Files which were not read
not_read_files <- test_files[sapply(all_data, is.null)]
Combine the data
total_data <- bind_rows(all_data)
Upvotes: 1