jim jarnac
jim jarnac

Reputation: 5152

python - create groups of tuples with tuples

Let's consider the following list of tuples:

[('AUD', 'M6A'),
 ('CHF', 'E7'),
 ('CHF', 'EUR'),
 ('CHF', 'M6E'),
 ('E7', 'CHF'),
 ('E7', 'EUR'),
 ('E7', 'M6E'),
 ('EMD', 'ES'),
 ('EMD', 'RTY'),
 ('ES', 'EMD'),
 ('ES', 'NQ'),
 ('EUR', 'CHF'),
 ('EUR', 'E7'),
 ('EUR', 'M6E'),
 ('GBP', 'M6B'),
 ('GF', 'LE'),
 ('LE', 'GF'),
 ('M6A', 'AUD'),
 ('M6B', 'GBP'),
 ('M6E', 'CHF'),
 ('M6E', 'E7'),
 ('M6E', 'EUR'),
 ('NIY', 'NKD'),
 ('NKD', 'NIY'),
 ('NQ', 'ES'),
 ('RTY', 'EMD')]

I would like to create from it a list of lists that clusters together the symbols that appear at least once together.

for instance ('CHF', 'EUR') and ('E7', 'EUR') would be merged into 1 list ['CHF', 'EUR', 'E7'] and so on.

Any idea how to achieve this?


For those who asked, here the loop idea to do that:

groups=[[]]
for item in li:
    for group in groups:
        if item[0] in group or item[1] in group:
            group.append(item)
        else:
            groups.append([item])

Horribly slow and ugly - actually it stunned my machine with memory usage.

The expected result is [{'AUD', 'M6A'}, {'CHF', 'E7', 'EUR', 'M6E'}, {'EMD', 'ES', 'NQ', 'RTY'}, {'GBP', 'M6B'}, {'GF', 'LE'}, {'NIY', 'NKD'}]

Upvotes: 1

Views: 134

Answers (2)

AChampion
AChampion

Reputation: 30258

You are looking for disjoint graphs, which you can use networkx to solve:

In []:
import networkx as nx
edges = [('AUD', 'M6A'), ('CHF', 'E7'), ('CHF', 'EUR'), ...]
g = nx.Graph(edges)
list(nx.connected_components(g))

Out[]:
[{'AUD', 'M6A'},
 {'CHF', 'E7', 'EUR', 'M6E'},
 {'EMD', 'ES', 'NQ', 'RTY'},
 {'GBP', 'M6B'},
 {'GF', 'LE'},
 {'NIY', 'NKD'}]

​If you want to avoid using networkx you can write it yourself reasonable efficiently (this should be O(n) to create the graph and O(n) to create the disjoint sets - which is overall O(n)) e.g.:

In []:
g = {}
for a, b in edges:
    g.setdefault(a, []).append(b)
    g.setdefault(b, []).append(a)

r = []
while g:
    s = set()
    q = [next(iter(g))]
    while q:
        k = q.pop()
        if k in s:
            continue
        s.add(k)
        for e in g.pop(k):
            q.append(e)
    r.append(s)
r

Out[]:
[{'AUD', 'M6A'},
 {'CHF', 'E7', 'EUR', 'M6E'},
 {'EMD', 'ES', 'NQ', 'RTY'},
 {'GBP', 'M6B'},
 {'GF', 'LE'},
 {'NIY', 'NKD'}]

Upvotes: 3

furoscame
furoscame

Reputation: 129

How I understood the problem:

l =[('AUD', 'M6A'),('CHF', 'E7'), ('CHF', 'EUR'), ('CHF', 'M6E'), 
('E7', 'CHF'), ('E7', 'EUR'), ('E7', 'M6E'), ('EMD', 'ES'), ('EMD', 'RTY'),
('ES', 'EMD'), ('ES', 'NQ'), ('EUR', 'CHF'), ('EUR', 'E7'), ('EUR', 'M6E'),
('GBP', 'M6B'), ('GF', 'LE'), ('LE', 'GF'), ('M6A', 'AUD'), ('M6B', 'GBP'),
('M6E', 'CHF'), ('M6E', 'E7'), ('M6E', 'EUR'), ('NIY', 'NKD'), 
('NKD', 'NIY'), ('NQ', 'ES'),('RTY', 'EMD')]

s = set()
[s.add(tt) for t in l for tt in t]
print(s)

Oh sorry result is in s! My fault!

Upvotes: -1

Related Questions