justyy
justyy

Reputation: 6041

What is the quickest way to group an array of double pairs

Input is 200 K doubles pairs i.e. keypair. they have no range i.e. keys can be any numbers.

Now, the task is to group such big data into the following:

key1 => (value1, value2, value3)
key2 => (value1, value2, value3)
...
...

a EPSILON (e.g. 1e-8) can be considered when comparing equality of key (doubles).

I have developed a O(n^2) solution but it is too slow for 200K doubles. wondering any improvements? such as O(nlogn) would be nice.

Another example,

Input:  <0.1, 0.3>,  <0.1, 0.4>,  <0.2, 0.5>, <0.2, 0.6>, <0.3, 0.7>
Output

0.1 => (0.3, 0.4)
0.2 => (0.5, 0.6)
0.3 => (0.7)

Upvotes: 1

Views: 91

Answers (2)

Eric J.
Eric J.

Reputation: 150108

In order to avoid the issue of the grouping key depending on other grouping keys - the issue of treating keys like

(1.0, 1.0 + EPSILON, 1.0 + 2xEPSILON, 1.0 + 3xEPSILON)

consistently with keys like

(1.0, 1.0 + 2xEPSILON, 1.0 + 4xEPSILON, ...)

the most logical choice seems to be to use a HashSet and create a hash key by quantizing the actual double value of the key into buckets EPSILON in size.

Depending on your requirements for EPSILON, you might adopt the discussion below to quantize your expected input range into something like a long:

Convert/Quantize Float Range to Integer Range

Upvotes: 1

ElKamina
ElKamina

Reputation: 7807

Why not sort? Sort according to first value and you are (almost) done. It is O(nlogn).

Upvotes: 1

Related Questions