What is the quickest way to group an array of double pairs

Question

Input is 200 K doubles pairs i.e. keypair. they have no range i.e. keys can be any numbers.

Now, the task is to group such big data into the following:

key1 => (value1, value2, value3)
key2 => (value1, value2, value3)
...
...

a EPSILON (e.g. 1e-8) can be considered when comparing equality of key (doubles).

I have developed a O(n^2) solution but it is too slow for 200K doubles. wondering any improvements? such as O(nlogn) would be nice.

Another example,

Input:  <0.1, 0.3>,  <0.1, 0.4>,  <0.2, 0.5>, <0.2, 0.6>, <0.3, 0.7>
Output

0.1 => (0.3, 0.4)
0.2 => (0.5, 0.6)
0.3 => (0.7)

Eric J. · Accepted Answer

In order to avoid the issue of the grouping key depending on other grouping keys - the issue of treating keys like

(1.0, 1.0 + EPSILON, 1.0 + 2xEPSILON, 1.0 + 3xEPSILON)

consistently with keys like

(1.0, 1.0 + 2xEPSILON, 1.0 + 4xEPSILON, ...)

the most logical choice seems to be to use a HashSet and create a hash key by quantizing the actual double value of the key into buckets EPSILON in size.

Depending on your requirements for EPSILON, you might adopt the discussion below to quantize your expected input range into something like a long:

Convert/Quantize Float Range to Integer Range

What is the quickest way to group an array of double pairs

Answers (2)

Related Questions