Chandan
Chandan

Reputation: 764

Distance between all pairs of points in Apache Spark

I have a text file with 470 latitude and longitude values. I want to calculate the distance of all pairs of points. Could anyone tell me how to do it in Apache Spark with JAVA as programming language.

~Regards, Chandan

Upvotes: 0

Views: 2013

Answers (2)

ale64bit
ale64bit

Reputation: 6242

Just to complement @Holden 's answer, here's a Java snippet illustrating the idea. The code assumes that you have a file where each line consist of the latitude and longitude values separated by space.

JavaRDD<String> input = sc.textFile("/path/to/your/file");

// map each line to pairs of Double, representing the points
JavaPairRDD<Double, Double> points = input.mapToPair(
      new PairFunction<String, Double, Double>() {
          public Tuple2<Double, Double> call(String s) throws Exception {
              String[] parts = s.split(" +");
              return new Tuple2<>(
                      Double.parseDouble(parts[0]),
                      Double.parseDouble(parts[1]));
          }
      }
);

// then, get the cartesian product of the point set, and map
// each resulting pair of points to the distance between them
JavaDoubleRDD distances = points.cartesian(points).mapToDouble(new DoubleFunction<Tuple2<Tuple2<Double, Double>, Tuple2<Double, Double>>>() {
  public double call(Tuple2<Tuple2<Double, Double>, Tuple2<Double, Double>> pointPair) throws Exception {
      Double lat1 = pointPair._1()._1();
      Double lon1 = pointPair._1()._2();
      Double lat2 = pointPair._2()._1();
      Double lon2 = pointPair._2()._2();
      return dist(lat1, lon1, lat2, lon2); // omitted for clarity
  }
});

// then, do something with your distances
distances.foreach(new VoidFunction<Double>() {
  public void call(Double aDouble) throws Exception {
      System.out.println("D: " + aDouble);
  }
});

Of course, if you need to maintain the link between each pair of points and the distance between them for some reason, just map to a pair consisting of the pair of points as first element and the distance as second.

Hope it helped. Cheers!

Upvotes: 0

Holden
Holden

Reputation: 7452

You can take your RDD of points and then use the cartesian function on the RDD with its self, this will return an RDD with pairs of all point combinations, and then you can map over that and compute the distance for each pair.

Upvotes: 1

Related Questions