Reputation: 11
I have R function with more than one argument and uses dplyr functions in it.
Now, I want to apply this UDF to spark data frame.
The sample code
myfun=function(objdf,x,y,k){
f <- function(x1,y1,x2,y2) {
d=(x2-x1) + (y2-y1)
}
search=function(df,x,y,k){
df1=data.frame(cbind(df,f(x,y,df$xx,df$yy)))
colnames(df1)=c(colnames(df),"val")
colnames(df1)
new_df=df1 %>% arrange(val) %>% head(k)
return(new_df)
}
searchwithk <- function(x,y,k) {
force(x,y,k);
function(df) search(df,x,y,k)
}
res <- spark_apply(objdf, function(df) {
searchwithk(df,x,y,k)
})
return(res)
}
#df= spark_dataframe
x=12.12
y=-74.5
k=5
result=myfun(df,x,y,k)
result
it gives me long error / unused parameter in force statement
How to resolve this?
Upvotes: 0
Views: 512
Reputation: 189
To add onto user9908499's answer, you can effectively pass as many arguments as you want to a two-parameter function by passing in a list of values using the context parameter.
For example,
searchwithk <- function(df, context) # these two parameters are the only two you should need
{
library(dplyr) # put any other libraries you need here
x <- context$x; y <- context$y; k <- context$k
function(df) search(df,x,y,k) # or whatever you want your code to be
}
res <- df %>% spark_apply(searchwithk,
context = list(x = x, y = y, k = k)) # put as much as you want in this context
Upvotes: 0
Reputation:
it gives me long error / unused parameter in force statement
force
is an unary function. You cannot pass multiple arguments at once:
searchwithk <- function(x,y,k) {
force(x)
force(y)
force(k)
function(df) search(df,x,y,k)
}
Also:
f
function doesn't return anything. Should be
f <- function(x1,y1,x2,y2) {
(x2-x1) + (y2-y1)
}
dplyr
methods will be out of scope in the closure. You'll probably need
search=function(df,x,y,k){
library(dplyr)
...
}
You incorrectly call the searchwithk
and use incorrect object. Should be
searchwithk(x,y,k)(df)
Possibly some other issues.
Upvotes: 2