The Great
The Great

Reputation: 7733

Elegant way to invoke multiple functions using R

I have a set of preprocessing functions which I finally call to create the final dataframe. As you can see output of each stage is passed as an input to the next stage.

wrap_func = function(DF){  # Wrapper function which has all the above functions put together. 
T1 = transform_ids(DF)                                                          
print("Id transformation success")   #print statements are used for debugging purposes
T2 = transform_dates(T1)
print("date transformation success")
T3 = measurement_ids(T2)
print("measurement_ids transformation success")
T4 = oper_val_concepts(T3)
print("operator transformation success")
T5 = range_values(T4)
print("range capture success")
T6 = col_format(T5)
print("column formatting success")
T7 = missing_impute(T6,def_val)
print("Missing value Imputation success")
T7 = T7[c(                                                     # reordering the column display
"measurement_id","person_id")]
 return(T7)
 } 

DF = wrap_func(dfm)

Is there any elegant way to write this?

This post has similar scenario but it's in Python.related post

Can you help me to make this elegant using R?

Upvotes: 0

Views: 83

Answers (1)

Samsa
Samsa

Reputation: 148

One solution would be:

pipeline <- function(DF){
 DF %>%
 transform_ids() %T>%                                                          
  cat("Id transformation success\n") %>%
 transform_dates() %T>%
  cat("date transformation success\n") %>%
 measurement_ids() %T>%
  cat("measurement_ids transformation success\n") %>%
 oper_val_concepts() %T>%
  cat("operator transformation success\n") %>%
 range_values() %T>%
  cat("range capture success\n") %>%
 col_format() %T>%
  cat("column formatting success\n") %>%
 missing_impute(def_val) %T>%
  cat("Missing value Imputation success\n") %>%
 .[c("measurement_id","person_id")]
} 

DF <- pipeline(dfm)

where:

  • magrittr's %>% allows you to pipe the result of the left side as first argument to the next function.

  • magrittr's %T>% allows you to return the left-hand side value (because you don't want to pipe the printed string to the next step)

  • the dot . at the end refers the the piped object (here the dataframe)

If you prefer to use print() instead of cat, be sure to put it between curly braces {print("Hello World!"}

If you are willing to give up the debugging messages (or integrating it into each unique functions) you can use purrr's compose:

pipeline <- compose(~ .x[c("measurement_id","person_id")], ~ missing_impute(.x, def_val), col_format, range_values, oper_val_concepts, measurement_ids, transform_dates, transform_ids)

DF <- pipeline(dfm)

Note that here functions are applied right to left (but you can have it left to right with the option compose(..., .dir = "forward"))

Upvotes: 1

Related Questions