LDT
LDT

Reputation: 3108

Choose command order in a function based on an error [R]

I have three files in a folder with the following names:

./multiqc_data$ ls 
file1.json
file2.json
file3.json

When I open the files with the TidyMultiqc package existing NA values in the files might lead to the following error:

files <- dir(path,pattern = "*.json")        #locate files
files %>% 
  map(~ load_multiqc(file.path(path, .)))    #parse them

## the error
Error in parse_con(txt, bigint_as_char) : 
  lexical error: invalid char in json text.
                  "mapped_failed_pct": NaN,                 "paired in
                     (right here) ------^

I want to create a function to handle this error.

I want every time this error pops up to be able to apply this sed function in all files of the folder.

system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))

Any ideas how can I achieve this

Upvotes: 3

Views: 140

Answers (3)

jared_mamrot
jared_mamrot

Reputation: 26225

Based on this open github issue, a potential solution provided by Peter Diakumis is to use RJSONIO::fromJSON() in place of jsonlite::read_json(). You could adapt this solution to your use-case by e.g. creating your own load_multiqc() function:

library(RJSONIO)

load_multiqc_bugfix <- function(paths,
                                plots = NULL,
                                find_metadata = function(...) {
                                  list()
                                },
                                plot_parsers = list(),
                                sections = "general") {
  assertthat::assert_that(all(sections %in% c(
    "general", "plot", "raw"
  )), msg = "Only 'general', 'plot' and 'raw' (and combinations of those) are valid items for the sections parameter")

  # Vectorised over paths
  paths %>%
    purrr::map_dfr(function(path) {
      parsed <- RJSONIO::fromJSON(path)

      # The main data is plots/general/raw
      main_data <- sections %>%
        purrr::map(~ switch(.,
          general = parse_general(parsed),
          raw = parse_raw(parsed),
          plot = parse_plots(parsed, plots = plots, plot_parsers = plot_parsers)
        )) %>%
        purrr::reduce(~ purrr::list_merge(.x, !!!.y), .init = list()) %>%
        purrr::imap(~ purrr::list_merge(.x, metadata.sample_id = .y))

      # Metadata is defined by a user function
      metadata <- parse_metadata(parsed = parsed, samples = names(main_data), find_metadata = find_metadata)
      purrr::list_merge(metadata, !!!main_data) %>%
        dplyr::bind_rows()
    }) %>%
    # Only arrange the columns if we have at least 1 column
    `if`(
      # Move the columns into the order: metadata, general, plot, raw
      ncol(.) > 0,
      (.) %>%
        dplyr::relocate(dplyr::starts_with("raw")) %>%
        dplyr::relocate(dplyr::starts_with("plot")) %>%
        dplyr::relocate(dplyr::starts_with("general")) %>%
        dplyr::relocate(dplyr::starts_with("metadata")) %>%
        # Always put the sample ID at the start
        dplyr::relocate(metadata.sample_id),
      .
    )
}

Upvotes: 0

moodymudskipper
moodymudskipper

Reputation: 47350

You could use this wrapper :

safe_load_multiqc <- function(path, file) {
  tryCatch(load_multiqc(file.path(path, file)), error = function(e) {
    system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))
    # retry
    load_multiqc(path, file)
  })
}

Upvotes: 4

Ric
Ric

Reputation: 5721

A good way to handle errors in work pipelines like that is using restarts and withCallingHandlers and withRestarts.

You establish the condition handlers and the recovery protocols (restarts) then you can choose what protocols to use and in which order. Calling handlers allows a much finer control on error conditions than common try-catch.

In the example, I wrote two handlers: removeNaNs (works at folder level) and skipFile (works at file level), if the first fails, the second is executed (simply skipping the file). Of course is an example

I think in your case you can simply run sed in every case, nevertheless, I hope this answer meet your looking for a canonical way

Inspiration and Extra lecture: Beyond Exception Handling: Conditions and Restarts


path <- "../your_path"

# function that does the error_prone task
do_task <- function(path){
  files <- dir(path,pattern = "*.json")        #locate files
  files %>% 
    map(~ withRestart(                         # set an alternative restart
      load_multiqc(file.path(path, .)),        # parsing
      skipFile = function() {                  # if fails, skip only this file      
        message(paste("skipping ", file.path(path, .)))
        return(NULL)
      }))   
}

# error handler that invokes "removeNaN"
removeNaNHandler <- function(e)  tryInvokeRestart("removeNaN")
# error handler that invokes "skipFile"
skipFileHandler <- function(e) tryInvokeRestart("skipFile")

# run the task with handlers in case of error
withCallingHandlers(
  condition = removeNaNHandler,    # call handler (on generic error)
  # condition = skipFileHandler,     # if previous fails skips file
  {
    # run with recovery protocols (can define more than one)
    withRestarts({
      do_task(path)},   
      removeNaN = function()   # protocol "removeNaN"  
      {               
        system(paste("gsed -i 's/NaN/null/g'",paste0(path,"*.json")))
        do_task(path)      # try again
      }
    )
  }
)

Upvotes: 1

Related Questions