tricky
tricky

Reputation: 1553

How exactly are UDFs working in SparkR?

Let's say I defined a R function that takes two numerics as inputs :

effectifTouche <- function(audience, extrapolated){
TM = audience / 1000000
VE= extrapolated/100
TME = TM * VE
nbVis = TME / 1000000.1
return (nbVis)
}

And it gives me back a score, so I would like to use it as an udf on two columns of SparkR DataFrame.

It was working in pyspark, and I was wondering how was SparkR working.

So I tried many things in both Sparklyr and SparkR but I can't get this UDF working.

Ideally, I would love to just do this :

df %>%
dapply(df_join,
    function(p) { effectifTouche(p$audience,p$extrapolated)
})

effectifTouche being my R function and audience, extrapolated my two columns of the spark DataFrame.

I will gladly take answers for both libraries SparkR and Sparklyr, because I tried both, and checked every single github issues with no success.

Thanks a lot

Edit for another tricky use case

df %>%
   mutate(my_var = as.numeric(strptime(endHour,format="%H:%M:%S"),unit="secs"))

Upvotes: 2

Views: 980

Answers (1)

kevinykuo
kevinykuo

Reputation: 4772

With simple arithmetic like this you're probably better off pushing the computation to Spark SQL, e.g.

df %>%
  mutate(TM = audience / 1000000,
         VE = extrapolated / 100,
         TME = TM * VE,
         nbVis = TME / 1000000.1)

If you actually need to use external R packages, we can help you better if you provide an example of df.

Upvotes: 0

Related Questions