Reputation: 1553
Let's say I defined a R function that takes two numerics as inputs :
effectifTouche <- function(audience, extrapolated){
TM = audience / 1000000
VE= extrapolated/100
TME = TM * VE
nbVis = TME / 1000000.1
return (nbVis)
}
And it gives me back a score, so I would like to use it as an udf on two columns of SparkR DataFrame.
It was working in pyspark, and I was wondering how was SparkR working.
So I tried many things in both Sparklyr and SparkR but I can't get this UDF working.
Ideally, I would love to just do this :
df %>%
dapply(df_join,
function(p) { effectifTouche(p$audience,p$extrapolated)
})
effectifTouche being my R function and audience, extrapolated my two columns of the spark DataFrame.
I will gladly take answers for both libraries SparkR and Sparklyr, because I tried both, and checked every single github issues with no success.
Thanks a lot
Edit for another tricky use case
df %>%
mutate(my_var = as.numeric(strptime(endHour,format="%H:%M:%S"),unit="secs"))
Upvotes: 2
Views: 980
Reputation: 4772
With simple arithmetic like this you're probably better off pushing the computation to Spark SQL, e.g.
df %>%
mutate(TM = audience / 1000000,
VE = extrapolated / 100,
TME = TM * VE,
nbVis = TME / 1000000.1)
If you actually need to use external R packages, we can help you better if you provide an example of df
.
Upvotes: 0