Cyrus Mohammadian
Cyrus Mohammadian

Reputation: 5193

apply if else mutate function on all columns of a spark dataframe in sparklyr

How can one apply an if else mutate function on all columns of a spark dataframe in sparklyr? For example, say I want to convert all values less than 2 in the iris dataframe to 0. Outside of sparklyr, there are a number of ways of doing this, but with sparklyr this seems a bit more complex. I tried one such way using the following custom function:

iris_sdf <- sdf_copy_to(sc, iris, overwrite = TRUE)
iris_num_sdf <- iris_sdf %>% select(-Species)

recode_val <- function(x) ifelse(x < 2, 0, x)

iris_num_sdf %>% mutate_all(funs(recode_val))

But got an error This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7 Error : org.apache.spark.sql.AnalysisException:

I tried the following using spark_apply but got nonsensical results.

iris_num_sdf %>% 
  spark_apply(recode_val, context = {colName <- colnames(iris_num_sdf)})

I also tried this below, which seems to do the trick, but am hoping for something more elegant.

convert_x <- function(col){
  col <- sym(col)
  iris_num_sdf %>% mutate({{col}} := ifelse({{col}} < 2, 0, {{col}})) %>% select({{col}})
}

col_list <- colnames(iris_num_sdf)
out <- lapply(col_list, convert_x)

do.call(sdf_bind_cols, out)

Upvotes: 0

Views: 906

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388982

You can try this approach -

library(dplyr)

convert_x <- function(col){
  iris_num_sdf %>% transmute({{col}} := ifelse(.data[[col]] < 2, 0,.data[[col]]))
}

col_list <- colnames(iris_num_sdf)
result <- purrr::map_dfc(col_list, convert_x)

A base R option -

recode_val <- function(x) ifelse(x < 2, 0, x)
out <- do.call(rbind, lapply(iris_num_sdf, recode_val))

Upvotes: 1

Related Questions