Reputation: 7733
I have a set of preprocessing functions which I finally call to create the final dataframe. As you can see output of each stage is passed as an input to the next stage.
wrap_func = function(DF){ # Wrapper function which has all the above functions put together.
T1 = transform_ids(DF)
print("Id transformation success") #print statements are used for debugging purposes
T2 = transform_dates(T1)
print("date transformation success")
T3 = measurement_ids(T2)
print("measurement_ids transformation success")
T4 = oper_val_concepts(T3)
print("operator transformation success")
T5 = range_values(T4)
print("range capture success")
T6 = col_format(T5)
print("column formatting success")
T7 = missing_impute(T6,def_val)
print("Missing value Imputation success")
T7 = T7[c( # reordering the column display
"measurement_id","person_id")]
return(T7)
}
DF = wrap_func(dfm)
Is there any elegant way to write this?
This post has similar scenario but it's in Python.related post
Can you help me to make this elegant using R?
Upvotes: 0
Views: 83
Reputation: 148
One solution would be:
pipeline <- function(DF){
DF %>%
transform_ids() %T>%
cat("Id transformation success\n") %>%
transform_dates() %T>%
cat("date transformation success\n") %>%
measurement_ids() %T>%
cat("measurement_ids transformation success\n") %>%
oper_val_concepts() %T>%
cat("operator transformation success\n") %>%
range_values() %T>%
cat("range capture success\n") %>%
col_format() %T>%
cat("column formatting success\n") %>%
missing_impute(def_val) %T>%
cat("Missing value Imputation success\n") %>%
.[c("measurement_id","person_id")]
}
DF <- pipeline(dfm)
where:
magrittr
's %>%
allows you to pipe the result of the left side as first argument to the next function.
magrittr
's %T>%
allows you to return the left-hand side value (because you don't want to pipe the printed string to the next step)
the dot .
at the end refers the the piped object (here the dataframe)
If you prefer to use print()
instead of cat
, be sure to put it between curly braces {print("Hello World!"}
If you are willing to give up the debugging messages (or integrating it into each unique functions) you can use purrr
's compose
:
pipeline <- compose(~ .x[c("measurement_id","person_id")], ~ missing_impute(.x, def_val), col_format, range_values, oper_val_concepts, measurement_ids, transform_dates, transform_ids)
DF <- pipeline(dfm)
Note that here functions are applied right to left (but you can have it left to right with the option compose(..., .dir = "forward")
)
Upvotes: 1