Bendriss Jaâfar
Bendriss Jaâfar

Reputation: 78

Calculating distance using latitude longitude coordinates in kilometers with Spark 2 Scala

I'm trying to calculate distance in kilometers between two geographical coordinates using the haversine formula in Spark 2.3 in Scala 2.11.8.

I want to compute the distances over users between two movements:

I have Longitude and Latitude, the idea is to get the distance in KM.

+-----------+------------------+------------------+-----------------+
|       user| distance         |Longitude_Centroid|Latitude_Centroid|    
+-----------+------------------+------------------+-----------------+    
|-2525      |              null| 7.038245640847997|39.48919886182785|    
|-2147      |12818.567585128396| 7.038245640847997|39.48919886182785|
|-2147      |12818.567585128396| 7.038245640847997|39.48919886182785|    
|-2525      |12862.278795753988| 7.050538333095536|39.49362379246508|

It worked fine for me using Python DataFrame however I am struggling in Scala Spark !

I used the following code, but it seems that it is not working properly.

df4.withColumn("a", pow(sin(( lag($"Latitude_Centroid", 1).over(window) - 
$"Latitude_Centroid") / 2), 2) + cos(($"Latitude_Centroid")) * 
cos((lag($"Latitude_Centroid", 1).over(window)) * 
pow(sin((lag($"Longitude_Centroid", 1).over(window) - 
$"Longitude_Centroid") / 2), 2))).withColumn("distance", atan2(sqrt($"a"), 
sqrt(-$"a" + 1)) * 2 * 6371).select("imei","distance","Longitude_Centroid","Latitude_Centroid").show(50)

Upvotes: 1

Views: 1927

Answers (1)

Bendriss Jaâfar
Bendriss Jaâfar

Reputation: 78

Just found the solution

df4.withColumn("lat_lag", lag($"Latitude_Centroid",     1).over(window)).withColumn("lng_lag", lag($"Longitude_Centroid",  1).over(window)).select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid")  .withColumn("a", pow(sin(toRadians($"Latitude_Centroid" - $"lat_lag") / 2), 2) + cos(toRadians($"lat_lag")) * cos(toRadians($"Latitude_Centroid")) * pow(sin(toRadians($"Longitude_Centroid" - $"lng_lag") / 2), 2))  .withColumn("distance", atan2(sqrt($"a"), sqrt(-$"a" + 1)) * 2 * 6371)   .select("imei","lat_lag","lng_lag","date_from","Longitude_Centroid","Latitude_Centroid","distance")  .show()

Upvotes: 3

Related Questions