Reputation: 1
I'm trying to call lapply within a function applied on spark data frame. According to documentation it's possible since Spark 2.0.
wrapper = function(df){
out = df
out$len <- unlist(lapply(df$value, function(y) length(y)))
return(out)
}
# dd is Spark Data Frame with one column (value) of type raw
dapplyCollect(dd, wrapper)
It returns error:
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...): org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 37.0 failed 1 times, most recent failure: Lost task 0.0 in stage 37.0 (TID 37, localhost): org.apache.spark.SparkException: R computation failed with
Error in (function (..., deparse.level = 1, make.row.names = TRUE) :
incompatible types (from raw to logical) in subassignment type fix
The following works fine:
wrapper(collect(dd))
But we want computation to run on nodes (not on driver).
What could be the problem? There is a related question but it does not help. Thanks.
Upvotes: 0
Views: 430
Reputation: 1
You need to add the schema as it can only be defaulted if the columns of the output are the same mode as the input.
Upvotes: 0