Chap
Chap

Reputation: 3837

Grouping connected pairs of values

I have a list containing unique pairs of values x and y; for example:

x      y
--     --
1      A
2      A
3      A
4      B
5      A
5      C
6      D
7      D
8      C
8      E
9      B
9      F
10     C
10     G

I want to divide this list of pairs as follows:

Group 1
1      A
2      A
3      A
5      A
5      C
8      C
10     C
8      E
10     G

Group 2
4      B
9      B
9      F

Group 3
6      D
7      D

Group 1 contains

The pairs in Group 2 can't be reached in such a manner from any pairs in Group 1, nor can the pairs in Group 3 be reached from either Group 1 or Group 2.

As suggested in Group 1, the chain of connections can be arbitrarily long.

I'm exploring solutions using Perl, but any sort of algorithm, including pseudocode, would be fine. For simplicity, assume that all of the data can fit in data structures in memory.

[UPDATE] Because I need to apply this approach to 5.3 billion pairs, scaleability is important to me.

Upvotes: 2

Views: 246

Answers (2)

Dave
Dave

Reputation: 9085

You can think of the letters and numbers as nodes of a graph, and the pairs as edges. Divide this graph into connected components in linear time.

The connected component with 'A' forms group 1. The other connected components form the other groups.

Upvotes: 1

Prune
Prune

Reputation: 77900

Pick a starting point. Find all points reachable from that, removing from the master list. Repeat for all added points, until no more can be reached. Move to the next group, starting with another remaining point. Continue until you have no more remaining points.

pool = [(1 A), (2 A), (3 A), (4 B), ... (10 G)]
group_list = []
group = []
pos = 0

while pool is not empty
    group = [ pool[0] ]   # start with next available point
    pos = -1

    while pos+1 < size(group)  // while there are new points in the group
        pos += 1
        group_point = group[pos]    // grab next available point
        for point in pool           // find all remaining points reachable
            if point and group_point have a coordinate in common
                remove point from pool
                add point to group

    // we've reached closure with that starting point
    add group to group_list

return group_list

Upvotes: 1

Related Questions