Greyhound
Greyhound

Reputation: 111

Tricky Median Question

Given n points, choose a point in the given list such that the sum of distances to this point is minimum ,compared to all others.

Distance is measured in the following manner.

For a point (x,y) all 8 adjacent points have distance 1.

(x+1,y)(x+1,y+1),(x+1,y-1),(x,y+1),(x,y-1),(x-1,y)(x-1,y+1),(x-1,y-1)

EDIT

More clearer explanation.

A function foo is defined as

foo(point_a,point_b) = max(abs(point_a.x - point_b.x),abs(point_a.y - point_b.y))

Find a point x such that sum([foo(x,y) for y in list_of_points]) is minimum.

Example

Input:

12 -14
-3 3
-14 7
-14 -3
2 -12
-1 -6

Output

-1 -6

Eg: Distance between (4,5) and 6,7) is 2.

This can be done in O(n^2) time, by checking the sum of each pair. Is there any better algorithm to do it?

Upvotes: 11

Views: 759

Answers (2)

Keith Randall
Keith Randall

Reputation: 23265

I can imagine a scheme better than O(n^2), at least in the common case.

Build a quadtree out of your input points. For each node in the tree, compute the number and average position of the points within that node. Then for each point, you can use the quadtree to compute its distance to all other points in less than O(n) time. If you're computing the distance from a point p to a distant quadtree node v, and v doesn't overlap the 45 degree diagonals from p, then the total distance from p to all the points in v is easy to compute (for v which are more horizontally than vertically separated from p, it is just v.num_points * |p.x - v.average.x|, and similarly using y coordinates if v is predominately vertically seperated). If v overlaps one of the 45 degree diagonals, recurse on its components.

That should beat O(n^2), at least when you can find a balanced quadtree to represent your points.

Upvotes: 1

Karoly Horvath
Karoly Horvath

Reputation: 96366

Update: it sometimes fails to find the optimum, I'll leave this here till I find the problem.

this is O(n): nth is O(n) (expected, not worst), iterating over the list is O(n). If you need strict O() then pick the middle element with sorting but then it's going to be O(n*log(n)).

Note: it's easy to modifiy it to return all the optimal points.

import sys

def nth(sample, n):
    pivot = sample[0]
    below = [s for s in sample if s < pivot]
    above = [s for s in sample if s > pivot]
    i, j = len(below), len(sample)-len(above)
    if n < i:      return nth(below, n)
    elif n >= j:   return nth(above, n-j)
    else:          return pivot

def getbest(li):
    ''' li is a list of tuples (x,y) '''
    l = len(li)
    lix = [x[0] for x in li]
    liy = [x[1] for x in li]

    mid_x1 = nth(lix, l/2) if l%2==1 else nth(lix, l/2-1)
    mid_x2 = nth(lix, l/2)
    mid_y1 = nth(liy, l/2) if l%2==1 else nth(liy, l/2-1)
    mid_y2 = nth(liy, l/2)

    mindist = sys.maxint
    minp = None
    for p in li:
        dist = 0 if mid_x1 <= p[0] <= mid_x2 else min(abs(p[0]-mid_x1), abs(p[0]-mid_x2))
        dist += 0 if mid_y1 <= p[1] <= mid_y2 else min(abs(p[1]-mid_y1), abs(p[1]-mid_y2))
        if dist < mindist:
            minp, mindist = p, dist
    return minp

It's based on the solution of the one dimensional problem - for a list of numbers find a number for which the sum distance is the minimum.

The solution for this is the middle element of the (sorted) list or any number between the two middle elements (including these two elements) if there are an even number of elements in the list.

Update: my nth algorithm seems to be very slow, probably there is a better way to rewrite it, sort outperforms it with < 100000 elements, so if you do speed comparison, just add sort(lix); sort(liy); and

def nth(sample, n):
    return sample[n]

For anyone out there who wants to test his solution, here is what I use. Just run a loop, generate input and compare your solution with the output of bruteforce.

import random
def example(length):
    l = []
    for x in range(length):
        l.append((random.randint(-100, 100), random.randint(-100,100)))
    return l

def bruteforce(li):
    bestsum = sys.maxint
    bestp = None
    for p in li:
        sum = 0
        for p1 in li:
            sum += max(abs(p[0]-p1[0]), abs(p[1]-p1[1]))
        if sum < bestsum:
            bestp, bestsum = p, sum
    return bestp

Upvotes: 5

Related Questions