getitout
getitout

Reputation: 115

Too many arguments in method in udf, scala

I have udf function for calculating distance between 2 coordinates.

import org.apache.spark.sql.functions._
import scala.math._

def  calculateDistance(la1:Double, lo1:Double,la2:Double,lo2:Double): Double   =>  udf(
{

val  R = 6373.0
val  lat1 = toRadians(la1)
val  lon1 = toRadians(lo1)
val  lat2 = toRadians(la2)
val  lon2 = toRadians(lo2)

val  dlon = lon2 - lon1
val  dlat = lat2 - lat1

val  a = pow(sin(dlat / 2),2) + cos(lat1) * cos(lat2) * pow(sin(dlon / 2),2)
val  c = 2 * atan2(sqrt(a), sqrt(1 - a))

val  distance = R * c
}
)

Here is the dataframe schema .

dfcity: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [Name: string, LAT: double ... 10 more fields]
root
|-- SCITY: string (nullable = true)
|-- LAT: double (nullable = true)
|-- LON: double (nullable = true)
|-- ADD: integer (nullable = true)
|-- CODEA: integer (nullable = true)
|-- CODEB: integer (nullable = true)
|-- TCITY: string (nullable = true)
|-- TLAT: double (nullable = true)
|-- TLON: double (nullable = true)
|-- TADD: integer (nullable = true)
|-- TCODEA: integer (nullable = true)
|-- TCODEB: integer (nullable = true)

When trying using withColumn

val dfcitydistance = dfcity.withColumn("distance", calculateDistance($"LAT", $"LON",$"TLAT", $"TLON"))
it generates error:
6: error: too many arguments for method calculateDistance: (distance: Double)

What's wrong in the code the passing column to UDF? Please advise. Thank you very much.

Upvotes: 0

Views: 1587

Answers (2)

Leo C
Leo C

Reputation: 22439

There is a couple of issues with your code:

def calculateDistance(la1:Double, lo1:Double, la2:Double, lo2:Double): Double => udf( {
  // ...
  val distance = R * c
} )
  1. To create a UDF, you should wrap the entire Scala function as the argument for method udf.
  2. In Scala, the last expression in your function body is what the function returns. The expression val distance = R * c is an assignment, hence will return a Unit. You should either append a line with just distance or simply replace the assignment expression with R * c.

Your UDF should look like the following:

val calculateDistance = udf( (la1:Double, lo1:Double, la2:Double, lo2:Double) => {
  // ...
  R * c
} )

Upvotes: 1

user10168234
user10168234

Reputation: 11

It should be

val calculateDistance = udf((la1:Double, lo1:Double,la2:Double,lo2:Double) => {
  ...
})

The function you define right now is a functions which takes local objects and returns nullary UDF

Upvotes: 1

Related Questions