Reputation: 5152
Let's consider the following list of tuples:
[('AUD', 'M6A'),
('CHF', 'E7'),
('CHF', 'EUR'),
('CHF', 'M6E'),
('E7', 'CHF'),
('E7', 'EUR'),
('E7', 'M6E'),
('EMD', 'ES'),
('EMD', 'RTY'),
('ES', 'EMD'),
('ES', 'NQ'),
('EUR', 'CHF'),
('EUR', 'E7'),
('EUR', 'M6E'),
('GBP', 'M6B'),
('GF', 'LE'),
('LE', 'GF'),
('M6A', 'AUD'),
('M6B', 'GBP'),
('M6E', 'CHF'),
('M6E', 'E7'),
('M6E', 'EUR'),
('NIY', 'NKD'),
('NKD', 'NIY'),
('NQ', 'ES'),
('RTY', 'EMD')]
I would like to create from it a list of lists that clusters together the symbols that appear at least once together.
for instance ('CHF', 'EUR')
and ('E7', 'EUR')
would be merged into 1 list ['CHF', 'EUR', 'E7']
and so on.
Any idea how to achieve this?
For those who asked, here the loop idea to do that:
groups=[[]]
for item in li:
for group in groups:
if item[0] in group or item[1] in group:
group.append(item)
else:
groups.append([item])
Horribly slow and ugly - actually it stunned my machine with memory usage.
The expected result is [{'AUD', 'M6A'}, {'CHF', 'E7', 'EUR', 'M6E'}, {'EMD', 'ES', 'NQ', 'RTY'}, {'GBP', 'M6B'}, {'GF', 'LE'}, {'NIY', 'NKD'}]
Upvotes: 1
Views: 134
Reputation: 30258
You are looking for disjoint graphs, which you can use networkx
to solve:
In []:
import networkx as nx
edges = [('AUD', 'M6A'), ('CHF', 'E7'), ('CHF', 'EUR'), ...]
g = nx.Graph(edges)
list(nx.connected_components(g))
Out[]:
[{'AUD', 'M6A'},
{'CHF', 'E7', 'EUR', 'M6E'},
{'EMD', 'ES', 'NQ', 'RTY'},
{'GBP', 'M6B'},
{'GF', 'LE'},
{'NIY', 'NKD'}]
If you want to avoid using networkx
you can write it yourself reasonable efficiently (this should be O(n)
to create the graph and O(n)
to create the disjoint sets - which is overall O(n)
) e.g.:
In []:
g = {}
for a, b in edges:
g.setdefault(a, []).append(b)
g.setdefault(b, []).append(a)
r = []
while g:
s = set()
q = [next(iter(g))]
while q:
k = q.pop()
if k in s:
continue
s.add(k)
for e in g.pop(k):
q.append(e)
r.append(s)
r
Out[]:
[{'AUD', 'M6A'},
{'CHF', 'E7', 'EUR', 'M6E'},
{'EMD', 'ES', 'NQ', 'RTY'},
{'GBP', 'M6B'},
{'GF', 'LE'},
{'NIY', 'NKD'}]
Upvotes: 3
Reputation: 129
How I understood the problem:
l =[('AUD', 'M6A'),('CHF', 'E7'), ('CHF', 'EUR'), ('CHF', 'M6E'),
('E7', 'CHF'), ('E7', 'EUR'), ('E7', 'M6E'), ('EMD', 'ES'), ('EMD', 'RTY'),
('ES', 'EMD'), ('ES', 'NQ'), ('EUR', 'CHF'), ('EUR', 'E7'), ('EUR', 'M6E'),
('GBP', 'M6B'), ('GF', 'LE'), ('LE', 'GF'), ('M6A', 'AUD'), ('M6B', 'GBP'),
('M6E', 'CHF'), ('M6E', 'E7'), ('M6E', 'EUR'), ('NIY', 'NKD'),
('NKD', 'NIY'), ('NQ', 'ES'),('RTY', 'EMD')]
s = set()
[s.add(tt) for t in l for tt in t]
print(s)
Oh sorry result is in s! My fault!
Upvotes: -1