Paddy
Paddy

Reputation: 1175

Sorting latitude and longitude by distance and grouping them based on tolerance

Goal


There are two goals:

  1. Sort latitude and longitude by distance
  2. Group the latitude and longitude based on a tolerance

Expected


Latitude and longitude are sorted in order of distance and grouped by tolerance.

Actual


Latitude and longitude are sorted in order of distance but the grouping does not feel right.

What I've tried


I've used the Haversine formula from several StackOverflow answers e.g. example 1, example 2, example 3, and managed to sort the locations by distance (goal #1 - I think 🤔). I attempted to group them using a tolerance via Math.abs(lat - lastLat) < tolerance but I'm not confident it's working or flexible (goal #2).

Code


const locations = [
  { lat: 77.62279999, lng: 12.95248389 },
  { lat: 77.62517676, lng: 12.95027966 },
  { lat: 77.62753442, lng: 12.93745478 },
  { lat: 77.62217671, lng: 12.93353553 },
];

const distance = (lat1, lon1, lat2, lon2) => {
  const radlat1 = (Math.PI * lat1) / 180;
  const radlat2 = (Math.PI * lat2) / 180;

  const theta = lon1 - lon2;
  const radtheta = (Math.PI * theta) / 180;

  let dist =
    Math.sin(radlat1) * Math.sin(radlat2) +
    Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
  dist = Math.acos(dist);
  dist = (dist * 180) / Math.PI;
  dist = dist * 60 * 1.1515;
  dist = dist * 1.609344;

  return dist;
};

const sortedLocationsByDistance = locations.sort(function (a, b) {
  return distance(0, 0, a.lat, a.lng) - distance(0, 0, b.lat, b.lng);
});

console.log('Sorted:', sortedLocationsByDistance);

const groupedLocationsByTolerance = {};

const tolerance = 0.001;

sortedLocationsByDistance.forEach(({ lat, lng }, index) => {
  if (
    Object.keys(groupedLocationsByTolerance).length && 
    Math.abs(lat - sortedLocationsByDistance.slice(-1)[0].lat) < tolerance &&
    Math.abs(lng - sortedLocationsByDistance.slice(-1)[0].lng) < tolerance
  ) {
    groupedLocationsByTolerance[Object.keys(groupedLocationsByTolerance).slice(-1)[0]].push({ lat, lng });
    return;
  }

  groupedLocationsByTolerance[index] = groupedLocationsByTolerance[index] || [];
  groupedLocationsByTolerance[index].push({ lat, lng });
});

console.log('Grouped:', groupedLocationsByTolerance);

Upvotes: 4

Views: 1353

Answers (2)

Paddy
Paddy

Reputation: 1175

This is the solution I ended up at. Please feel free to tweak and refine it as you see fit (post your own to help others). I realise there are a number of ways you could achieve it.

What I've tried


As seen above, the first approach used a center point (0, 0) and measured the distance to each pin via Haversine formula. The next solution used an algorithm that calculated the angle of the ray between each point using trigonometric functions (a big thank you to this answer).

const getSortedByDistance = (
  coordinates,
  centerPoint
) => {
  const points = [...coordinates]; // Copy the array, sort modifies it
  const angles = {};

  // Calculate the angle of the ray between that center point and each point, using inverse trigonometric functions
  points.forEach((point) => {
    angles[getCoordinateKey(point.latitude, point.longitude)] = Math.atan2(
      point.latitude - centerPoint.latitude,
      point.longitude - centerPoint.longitude
    );
  });

  // Filter the order based on the angle
  points.sort(
    (pointA, pointB) =>
      angles[getCoordinateKey(pointA.latitude, pointA.longitude)] -
      angles[getCoordinateKey(pointB.latitude, pointB.longitude)]
  );

  return points;
};

While this grouped coordinates fairly well, it did not guarantee they were in the correct order; making the last group check fail in some circumstances.

What I did


After a bit of thinking, the best option is to use the Haversine formula to measure the distance pin-to-pin; creating accurate grouping. The only downside to this is the performance cost for the full calculation.

const getDistance = (lat1, lon1, lat2, lon2) => {
    const radlat1 = (Math.PI * lat1) / 180;
    const radlat2 = (Math.PI * lat2) / 180;

    const theta = lon1 - lon2;
    const radtheta = (Math.PI * theta) / 180;

    let dist =
        Math.sin(radlat1) * Math.sin(radlat2) +
        Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
    dist = Math.acos(dist);
    dist = (dist * 180) / Math.PI;
    dist = dist * 60 * 1.1515;
    dist = dist * 1.609344;

    return dist;
};

const getGroupsByDistance = (coordinates, tolerance) => {
    const coordinatesToleranceMap = Array(coordinates.length).fill(1); // 1 = true

    return coordinates.reduce(
        (groups, coordinate, index) => {
            // The tolerance map contains true/false (0/1) values, ignore anything with 0
            if (coordinatesToleranceMap[index] === 0) return groups;

            const { latitude, longitude } = coordinate;

            // This will return the current listing because it's the same lat/lng. We don't need to check it's length because it always has one
            const coordinatesWithinTolerance = coordinates.filter(
                (coordinate, index) => {
                    // We're not interested in any listing that have been filtered out already
                    if (coordinatesToleranceMap[index] === 0) {
                        return false;
                    }

                    const isSameLatLng =
                        latitude === coordinate.latitude &&
                        longitude === coordinate.longitude;

                    // Measure distance using Haversine formula
                    const isWithinTolerance =
                        isSameLatLng ||
                        getDistance(
                            latitude,
                            longitude,
                            coordinate.latitude,
                            coordinate.longitude
                        ) <= tolerance;

                    // Ignore the current listing on the next filter
                    if (isWithinTolerance) coordinatesToleranceMap[index] = 0;

                    return isWithinTolerance;
                }
            );

            groups.push({
                latitude,
                longitude,
                properties: coordinatesWithinTolerance
            });

            return groups;
        },
        []
    );
};

const locations = [
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 78.62217671, longitude: 12.93353553 },
  { latitude: 77.62517676, longitude: 12.95027966 },
  { latitude: 77.62753442, longitude: 12.93745478 },
  { latitude: 77.62752442, longitude: 12.93745478 },
  { latitude: 77.62214671, longitude: 12.93353553 },
];

const tolerance = 0.01;

const t0 = performance.now();
const groupsByDistance = getGroupsByDistance(locations, tolerance);
const t1 = performance.now();

console.log('Time:', t1 - t0);
console.log(groupsByDistance);

Performance


The function will take longer to execute as the dataset grows. It needs chunking down in order to reduce the execution time. Alternatively, reducing the amount of looping could lead to a significant improvement.

Improvements


If anybody can come up with a better answer (improvement/new logic) that is performant, please add it below.

Upvotes: 1

Trentium
Trentium

Reputation: 3719

For very large sets, the number of iterations can be reduced significantly by a combination of both(!) of your answers. The logic is thus...

  • O( n ) - Calculate the distance of every lat/long from a fixed point. In the example below, the origin lat/long of ( 0, 0 ) is used as the fixed point.

  • O( n log n ) - Sort this array by ascending distance.

  • O( nm ) - Now, similar to a bubble sort, walk this array with a nested pair of loops such that while the inner loop distance from origin is still within the tolerance of the outer loop distance from origin, perform the expensive getDistance check to determine if the coordinates are indeed within tolerance. Continue in the inner loop while the distance from origin difference is still within the tolerance, otherwise break from the inner loop, as there is no sense in comparing the distances now known to be further apart than the tolerance. In this way, the algorithm is only checking a handful of distances that are within range of the tolerance.

The "m" in O( nm ) is the average number of locations within tolerance, and for large numbers of locations, should be relatively small compared to "n", assuming some distribution of the locations and a small tolerance. For large tolerances, of course, "m" will approach "n" and then the algorithm is performing at O( n^2 )...

const locations = [
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 77.62279999, longitude: 12.95248389 },
  { latitude: 78.62217671, longitude: 12.93353553 },
  { latitude: 77.62517676, longitude: 12.95027966 },
  { latitude: 77.62753442, longitude: 12.93745478 },
  { latitude: 77.62752442, longitude: 12.93745478 },
  { latitude: 77.62214671, longitude: 12.93353553 },
];

const getDistance = (lat1, lon1, lat2, lon2) => {
    const radlat1 = (Math.PI * lat1) / 180;
    const radlat2 = (Math.PI * lat2) / 180;

    const theta = lon1 - lon2;
    const radtheta = (Math.PI * theta) / 180;

    let dist =
        Math.sin(radlat1) * Math.sin(radlat2) +
        Math.cos(radlat1) * Math.cos(radlat2) * Math.cos(radtheta);
    dist = Math.acos(dist);
    dist = (dist * 180) / Math.PI;
    dist = dist * 60 * 1.1515;
    dist = dist * 1.609344;

    return dist;
};


for ( let loc of locations ) {
  loc.distanceFromOrigin = getDistance( loc.latitude, loc.longitude, 0, 0 );
}

console.log( 'Sorted by closest to origin:' );
locations.sort( ( a, b ) => a.distanceFromOrigin - b.distanceFromOrigin );
console.log( locations );

let tolerance = 0.01;
let results = [];

for ( let i = 0; i < locations.length; i++ ) {
  let loci = locations[ i ];
  for ( let j = i + 1; j < locations.length; j++ ) {
    let locj = locations[ j ];
    if ( locj.distanceFromOrigin - loci.distanceFromOrigin <= tolerance ) {
      if ( getDistance( loci.latitude, loci.longitude, locj.latitude, locj.longitude ) <= tolerance ) {
        loci.properties = loci.properties ?? [];
        loci.properties.push( locj );
      }
    } else {
      break;
    }
  }
  if ( loci.properties ) {
    results.push( loci );
  }
}

console.log( `Closest by tolerance of ${tolerance}:` );
console.log( results );
    

Upvotes: 1

Related Questions