MPTeam
MPTeam

Reputation: 1

SparkR dapply not working

I'm trying to call lapply within a function applied on spark data frame. According to documentation it's possible since Spark 2.0.

wrapper = function(df){
     out = df
     out$len <- unlist(lapply(df$value, function(y) length(y)))
     return(out)
}
# dd is Spark Data Frame with one column (value) of type raw
dapplyCollect(dd, wrapper)

It returns error:

Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 1 times, most recent failure: Lost task 0.0 in stage 37.0 (TID 37, localhost): org.apache.spark.SparkException: R computation failed with
 Error in (function (..., deparse.level = 1, make.row.names = TRUE)  : 
  incompatible types (from raw to logical) in subassignment type fix

The following works fine:

wrapper(collect(dd))

But we want computation to run on nodes (not on driver).

What could be the problem? There is a related question but it does not help. Thanks.

Upvotes: 0

Views: 430

Answers (1)

Ashley Ford
Ashley Ford

Reputation: 1

You need to add the schema as it can only be defaulted if the columns of the output are the same mode as the input.

Upvotes: 0

Related Questions