user3520634
user3520634

Reputation: 301

Algorithm for finding all points within distance of another point

I had this problem for an entry test for a job. I did not pass the test. I am disguising the question in deference to the company.

Imagine you have N number of people in a park of A X B space. If a person has no other person within 50 feet, he enjoys his privacy. Otherwise, his personal space is violated. Given a set of (x, y), how many people will have their space violated?

For example, give this list in Python:

people = [(0,0), (1,1), (1000, 1000)]

We would find 2 people who are having their space violated: 1, 2.

We don't need to find all sets of people; just the total number of unique people.

You can't use a brute method to solve the problem. In other words, you can't use a simple array within an array.

I have been working on this problem off and on for a few weeks, and although I have gotten a solution faster than n^2, have not come up with a problem that scales.

I think the only correct way to solve this problem is by using Fortune's algorithm?

Here's what I have in Python (not using Fortune's algorithm):

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability
TEST = True

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

def _get_distance(x1, y1, x2, y2):
    """
    require: x1, y1, x2, y2: all integers
    return: a distance as a float
    """
    distance = math.sqrt(math.pow((x1 - x2), 2) + math.pow((y1 - y2),2))
    return distance

def check_real_distance(people1, people2, conflict_radius):
    """
    determine if two people are too close

    """
    if people2[1] - people1[1] > conflict_radius:
        return False
    d = _get_distance(people1[0], people1[1], people2[0], people2[1])
    if d >= conflict_radius:
        return False
    return True

def check_for_conflicts(peoples, conflict_radius):
    # sort  people
    def sort_func1(the_tuple):
        return the_tuple[0]
    _peoples = []
    index = 0
    for people in peoples:
        _peoples.append((people[0], people[1], index))
        index += 1
    peoples = _peoples
    peoples = sorted(peoples, key = sort_func1)
    conflicts_dict = {}
    i = 0
    # use a type of sweep strategy
    while i < len(peoples) - 1:
        x_len = peoples[i + 1][0] - peoples[i][0]
        conflict = False
        conflicts_list =[peoples[i]]
        j = i + 1
        while x_len <= conflict_radius and j < len(peoples):
            x_len = peoples[j][0] - peoples[i][0]
            conflict = check_real_distance(peoples[i], peoples[j], conflict_radius)
            if conflict:
                people1 = peoples[i][2]
                people2 = peoples[j][2]
                conflicts_dict[people1] = True
                conflicts_dict[people2] = True
            j += 1
        i += 1
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

Upvotes: 4

Views: 2688

Answers (5)

user3520634
user3520634

Reputation: 301

I cam up with an answer that seems to take O(N) time. The strategy is to sort the array by X values. For each X value, sweep left until a conflict is found, or the distance exceeds the conflict distance (500 M). If no conflict is found, sweep left in the same manner. With this technique, you limit the amount of searching.

Here is the code:

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

Upvotes: 0

user3520634
user3520634

Reputation: 301

I found a relatively solution to the problem. Sort the list of coordinates by the X value. Then look at each X value, one at a time. Sweep right, checking the position with the next position, until the end of the sweep area is reached (500 meters), or a conflict is found.

If no conflict is found, sweep left in the same manner. This method avoids unnecessary checks. For example, if there are 1,000,000 people in the park, then all of them will be in conflict. The algorithm will only check each person one time: once a conflict is found the search stops.

My time seems to be O(N).

Here is the code:

import math
import random
random.seed(1)  # Setting random number generator seed for repeatability

NUM_PEOPLE = 10000
PARK_SIZE = 128000  # Meters.
CONFLICT_RADIUS = 500  # Meters.

check_real_distance = lambda conflict_radius, people1, people2: people2[1] - people1[1] <= conflict_radius \
        and math.pow(people1[0] - people2[0], 2) + math.pow(people1[1] - people2[1], 2) <= math.pow(conflict_radius, 2)


def check_for_conflicts(peoples, conflict_radius):
    peoples.sort(key = lambda x: x[0])
    conflicts_dict = {}
    i = 0
    num_checks = 0
    # use a type of sweep strategy
    while i < len(peoples)  :
        conflict = False
        j = i + 1
        #sweep right
        while j < len(peoples) and peoples[j][0] - peoples[i][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[i], peoples[j])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j += 1
        j = i - 1
        #sweep left
        while j >= 0 and peoples[i][0] - peoples[j][0] <= conflict_radius \
                and not conflict and not conflicts_dict.get(i):
            num_checks += 1
            conflict = check_real_distance(conflict_radius, peoples[j], peoples[i])
            if conflict:
                conflicts_dict[i] = True
                conflicts_dict[j] = True
            j -= 1
        i += 1
    print("num checks is {0}".format(num_checks))
    print("num checks per size is is {0}".format(num_checks/ NUM_PEOPLE))
    return len(conflicts_dict.keys())

def gen_coord():
    return int(random.random() * PARK_SIZE)

if __name__ == '__main__':
    people_positions = [[gen_coord(), gen_coord()] for i in range(NUM_PEOPLE)]
    conflicts = check_for_conflicts(people_positions, CONFLICT_RADIUS)
    print("people in conflict: {}".format(conflicts))

Upvotes: 0

BPL
BPL

Reputation: 9863

Here's my solution to this interesting problem:

from math import sqrt
import math
import random


class Person():

    def __init__(self, x, y, conflict_radius=500):
        self.position = [x, y]
        self.valid = True
        self.radius = conflict_radius**2

    def validate_people(self, people):
        P0 = self.position

        for p in reversed(people):
            P1 = p.position
            dx = P1[0] - P0[0]
            dy = P1[1] - P0[1]
            dx2 = (dx * dx)

            if dx2 > self.radius:
                break

            dy2 = (dy * dy)
            d = dx2 + dy2

            if d <= self.radius:
                self.valid = False
                p.valid = False

    def __str__(self):
        p = self.position
        return "{0}:{1} - {2}".format(p[0], p[1], self.valid)


class Park():

    def __init__(self, num_people=10000, park_size=128000):
        random.seed(1)
        self.num_people = num_people
        self.park_size = park_size

    def gen_coord(self):
        return int(random.random() * self.park_size)

    def generate(self):
        return [[self.gen_coord(), self.gen_coord()] for i in range(self.num_people)]


def naive_solution(data):
    sorted_data = sorted(data, key=lambda x: x[0])
    len_sorted_data = len(sorted_data)
    result = []

    for index, pos in enumerate(sorted_data):
        print "{0}/{1} - {2}".format(index, len_sorted_data, len(result))
        p = Person(pos[0], pos[1])
        p.validate_people(result)
        result.append(p)

    return result

if __name__ == '__main__':
    people_positions = Park().generate()

    with_conflicts = len(filter(lambda x: x.valid, naive_solution(people_positions)))
    without_conflicts = len(filter(lambda x: not x.valid, naive_solution(people_positions)))
    print("people with conflicts: {}".format(with_conflicts))
    print("people without conflicts: {}".format(without_conflicts))

I'm sure the code can be still optimized further

Upvotes: 0

Tamim Addari
Tamim Addari

Reputation: 7841

You can see this topcoder link and section 'Closest pair'. You can modify the closest pair algorithm so that the distance h is always 50.

So , what you basically do is ,

  • Sort the people by X coordinate
  • Sweep from left to right.
  • Keep a balanced binary tree and keep all the points within 50 radii in the binary tree. The key of the binary tree would be the Y coordinates of the point
  • Select the points with Y-50 and Y+50 , this can be done with the binary tree in lg(n) time.
  • So the overall complexity becomes nlg(n)
  • Be sure to mark the points you find to skip those points in the future.

You can use set in C++ as the binary tree.But I couldn't find if python set supports range query or upper_bound and lower_bound.If someone knows , please point that out in the comments.

Upvotes: 0

samgak
samgak

Reputation: 24417

As you can see from the comments, there's lots of approaches to this problem. In an interview situation you'd probably want to list as many as you can and say what the strengths and weaknesses of each one are.

For the problem as stated, where you have a fixed radius, the simplest approach is probably rounding and hashing. k-d trees and the like are powerful data structures, but they're also quite complex and if you don't need to repeatedly query them or add and remove objects they might be overkill for this. Hashing can achieve linear time, versus spatial trees which are n log n, although it might depend on the distribution of points.

To understand hashing and rounding, just think of it as partitioning your space up into a grid of squares with sides of length equal to the radius you want to check against. Each square is given it's own "zip code" which you can use as a hash key to store values in that square. You can compute the zip code of a point by dividing the x and y co-ordinates by the radius, and rounding down, like this:

def get_zip_code(x, y, radius):
    return str(int(math.floor(x/radius))) + "_" + str(int(math.floor(y/radius)))

I'm using strings because it's simple, but you can use anything as long as you generate a unique zip code for each square.

Create a dictionary, where the keys are the zip codes, and the values are lists of all the people in that zip code. To check for conflicts, add the people one at a time, and before adding each one, test for conflicts with all the people in the same zip code, and the zip code's 8 neighbours. I've reused your method for keeping track of conflicts:

def check_for_conflicts(peoples, conflict_radius):

    index = 0
    d = {}
    conflicts_dict = {}
    for person in peoples:  

        # check for conflicts with people in this person's zip code
        # and neighbouring zip codes:
        for offset_x in range(-1, 2):
            for offset_y in range(-1, 2):
                 offset_zip_code = get_zip_code(person[0] + (offset_x * conflict_radius), person[1] + (offset_y * conflict_radius), conflict_radius)

                 if offset_zip_code in d:
                     # get a list of people in this zip:
                     other_people = d[offset_zip_code]
                     # check for conflicts with each of them:
                     for other_person in other_people:
                         conflict = check_real_distance(person, other_person, conflict_radius)
                         if conflict:
                             people1 = index
                             people2 = other_person[2]
                             conflicts_dict[people1] = True
                             conflicts_dict[people2] = True

        # add the new person to their zip code
        zip_code = get_zip_code(person[0], person[1], conflict_radius)
        if not zip_code in d:
            d[zip_code] = []
        d[zip_code].append([person[0], person[1], index])
        index += 1

    return len(conflicts_dict.keys())

The time complexity of this depends on a couple of things. If you increase the number of people, but don't increase the size of the space you are distributing them in, then it will be O(N2) because the number of conflicts is going to increase quadratically and you have to count them all. However if you increase the space along with the number of people, so that the density is the same, it will be closer to O(N).

If you're just counting unique people, you can keep a count if how many people in each zip code have at least 1 conflict. If its equal to everyone in the zip code, you can early out of the loop that checks for conflicts in a given zip after the first conflict with the new person, since no more uniques will be found. You could also loop through twice, adding all people on the first loop, and testing on the second, breaking out of the loop when you find the first conflict for each person.

Upvotes: 3

Related Questions