Reputation: 24652
I have a list of values (1-dimensional) and I would like to know the best data structure / algorithm for finding the nearest to a query value I have. Most of the solutions (all?) I found for questions here are for 2 or more dimensions. Can anybody suggest to me the approach for my case?
My instinct tells me to sort the data and use binary search somehow. By the way, there is no limit on the construction or insertion time for any tree needed, so probably someone can suggest a better tree than simply a sorted list.
Upvotes: 10
Views: 7610
Reputation: 48687
Using OCaml's Set
:
module S = Set.Make(struct type t = int let compare = compare end)
let nearest xs y =
let a, yisin, b = S.split y xs in
if yisin then y
else
let amax, bmin = S.max_elt a, S.min_elt b in
if abs (amax - y) < abs (bmin - y) then amax else bmin
Incidentally, you may appreciate my nth-nearest neighbor sample from OCaml for Scientists and The F#.NET Journal article Traversing networks: nth-nearest neighbors.
Upvotes: 0
Reputation: 28132
Sort the list and use binary search to find the element you are looking for, then compare your left and right neighbors. You can use an array which is O(1) access.
Something like:
int nearest(int[] list, int element) {
sort(list);
int idx = binarySearch(element, list);
// make sure you are accessing elements that exist
min = (element - list[idx-1] <= list[idx+1] - element) ? idx-1 : idx+1;
return list[min];
}
This is O(n log n), which will be amortized if you are going to perform many look ups.
EDIT: For that you'd have to move the sorting out of this method
Upvotes: 1
Reputation: 1014
If you need something faster than O(log(n)), which you can easily get with a sorted array or a binary search tree, you can use a van Emde Boas Tree. vEB trees give you O(log(log(n))) to search for the closest element on either side.
Upvotes: 9
Reputation: 22446
If insertion time is irrelevant, then binary search on a sorted array is the simplest way to achieve O(log N) query time. Each time an item is added sort everything. For each query, perform a binary search. If a match is found, return it. Otherwise, the binary search should return the index of the item, where it should have been inserted. Use this index to check the two neighboring items and determine which of them is closer to the query point.
I suppose that there are solutions with O(1) time. I will try to think of one that doesn't involve too much memory usage...
Upvotes: 2
Reputation: 4102
As you already mentioned, the fastest and easiest way should be sorting the data and then looking for the left and right neighbour of a data point.
Upvotes: 1