Reputation:

K nearest points. Time complexity O(n), not O(nLogn). How?

Given a million list of co-ordinates in the form of longitude and latitude just as Google maps, how will you print closest k cities to a given location?

I had this question asked during an interview. The interviewer said this can be done in O(n) by using insertion sort up to k rather that sorting the whole list, which is NlogN. I found other answers online, and most say NLogN... was he[interviewer] correct?

Upvotes: 4

Answers (4)

JJJ

Reputation: 1029

Working on the assumption that latitude and longitude have a given number of digits, we can actually use radix sort. It seems similar to Hanqiu's answer, but I'm not sure if it's the same one. The Wikipedia description:

In computer science, radix sort is a non-comparative integer sorting algorithm that sorts data with integer keys by grouping keys by the individual digits which share the same significant position and value. A positional notation is required, but because integers can represent strings of characters (e.g., names or dates) and specially formatted floating point numbers, radix sort is not limited to integers. Radix sort dates back as far as 1887 to the work of Herman Hollerith on tabulating machines.

The article says the following about efficiency:

The topic of the efficiency of radix sort compared to other sorting algorithms is somewhat tricky and subject to quite a lot of misunderstandings. Whether radix sort is equally efficient, less efficient or more efficient than the best comparison-based algorithms depends on the details of the assumptions made. Radix sort complexity is O(wn) for n keys which are integers of word size w. Sometimes w is presented as a constant, which would make radix sort better (for sufficiently large n) than the best comparison-based sorting algorithms, which all perform Θ(n log n) comparisons to sort n keys.

In your case, the w corresponds to the word size of your latitude and longitude, that is the number of digits. In particular, this gets more efficiently for lower precision (fewer digits) in your coordinates. Whether it's more efficient that nlogn algorithms depends on your n and your implementation. Asymptotically, it's better than nlogn.

Obviously, you'd still need to combine the two into actual distance.

Upvotes: 0

Maljam

Reputation: 6284

You could also use this algorithm with O(N) complexity, which exploits a "HashMap-like" array which would automatically sort the distances, within a given resolution.

Here's the pseudo-code in Java:

City[] cities = //your city list
Coordinate coor = //the coordinate of interest

double resolution = 0.1, capacity = 1000;

ArrayList<City>[] cityDistances = new ArrayList<City>[(int)(capacity/resolution)];
ArrayList<City> closestCities = new ArrayList<City>();

for(City c : cities) {
    double distance = coor.getDistance(c);
    int hash = distance/resolution;

    if(cityDistances[hash] == null) cityDistances[hash] = new ArrayList<City>();
    cityDistances[hash].add(c);
}


for(int index = 0 ; closestCities.size() < 10 ; index++) {
    ArrayList<City> cList = cityDist[index];
    if(cList == null) continue;
    closestCities.addAll(cList);
}

The idea is to loop through the list of cities, calculate the distance with the coordinate of interest, and then use the distance to determine where the city should be added to the "HashMap-like" array cityDistances. The smaller the distance, the closer the index will be to 0.
The smaller the resolution, the more likely that the list closestCities will end up with 10 cities after the last loop.

Upvotes: -1

maxim1000

Reputation: 6365

It's an algorithm of quickselect (https://en.wikipedia.org/wiki/Quickselect)

Basically it's quicksort with a modification - whenever you have two halves you sort only one of them:

If a half contains k-th position - continue with subdividing and sorting it
If a half is completely after k-th position - no need to sort it, we are not interested in those elements
If a half is completely before k-th position - no need to sort it, we need all those elements and their order doesn't matter

After finish you will have the closest k elements in the first k places of the array (but they are not necessarily sorted).

Since at every step you process only one half, time will be n+n/2+n/4+n/8+...=2n (ignoring constants).

For guarantied O(n) you can always select a good pivot with e.g. median of medians (https://en.wikipedia.org/wiki/Median_of_medians).

Upvotes: 2

Hanqiu Zhang

Reputation: 46

I think, when calculating the distance, you can maintain a list of K elements.

Every time you have a new distance, insert it into the list if it is smaller than the largest one, and remove the largest one.

This insertion can be O(k) if you are using an sorted array, or O(logK) if you are using a binary heap.

In the worst case, you will insert n times. In total, it will be O(NK) or O(NlogK). If K is small enough, it is O(N).

Upvotes: 3

K nearest points. Time complexity O(n), not O(nLogn). How?

Answers (4)

Related Questions