Reputation: 6041
Input is 200 K doubles pairs i.e. keypair. they have no range i.e. keys can be any numbers.
Now, the task is to group such big data into the following:
key1 => (value1, value2, value3)
key2 => (value1, value2, value3)
...
...
a EPSILON (e.g. 1e-8) can be considered when comparing equality of key (doubles).
I have developed a O(n^2) solution but it is too slow for 200K doubles. wondering any improvements? such as O(nlogn) would be nice.
Another example,
Input: <0.1, 0.3>, <0.1, 0.4>, <0.2, 0.5>, <0.2, 0.6>, <0.3, 0.7>
Output
0.1 => (0.3, 0.4)
0.2 => (0.5, 0.6)
0.3 => (0.7)
Upvotes: 1
Views: 91
Reputation: 150108
In order to avoid the issue of the grouping key depending on other grouping keys - the issue of treating keys like
(1.0, 1.0 + EPSILON, 1.0 + 2xEPSILON, 1.0 + 3xEPSILON)
consistently with keys like
(1.0, 1.0 + 2xEPSILON, 1.0 + 4xEPSILON, ...)
the most logical choice seems to be to use a HashSet and create a hash key by quantizing the actual double value of the key into buckets EPSILON in size.
Depending on your requirements for EPSILON, you might adopt the discussion below to quantize your expected input range into something like a long:
Convert/Quantize Float Range to Integer Range
Upvotes: 1
Reputation: 7807
Why not sort? Sort according to first value and you are (almost) done. It is O(nlogn).
Upvotes: 1