ephemeral
ephemeral

Reputation: 129

Finding edges represented as lists

Suppose we have a bipartite graph in which the nodes are represented as list. For example suppose a bipartite graph has the nodes as l1 = [1,2,3,4,5] and l2 = [6,7,8,9,10] are the nodes in the two partitions The edges are [[1,8, 'a'], [4,9,'b']] represented as list of list as given in the figure 1

figure 2

If somehow we have merged the nodes of the bipartite graph and this is now represented as in 1 by [[1,2,3], [4, 5]] and [[6,7] , [8, 9, 10]] then in this new graph, we would have edges between these groups if there is an edge between any pair in the original graph. For example, in the above, there would be an a edge between groups [1,2,3] and [8,9,10] since there is an edge originally between 1 and 8, this is depicted in Figure 1. How do find the edges in the new graph in Python, what would be a suitable data structure representation and how to find the resulting edges from this original graph?

figure 1

I have used lists for this, but the problem is to find these edges, I am having to iterate over all the nodes in the new graph to check if there should be an edge. Is there a more efficient way to do this?

Code I have tried :

l1 = [1,2,3,4,5]
l2 = [6,7,8,9,10]
l3 = [[1,2,3], [4, 5]]
l4 = [[6,7] , [8, 9, 10]]
edges = [[1,8, 'a'], [4,9,'b']]

for e in edges:
    for l in l3:
        for k in l4:
            if e[0] in l and e[1] in k:
                print(e[0], e[1], e[2])

Upvotes: 1

Views: 225

Answers (2)

Uli Sotschok
Uli Sotschok

Reputation: 1236

Let's start by getting the index of the group that contains the specific value.

def idx_group_in_list(value, list_) -> int:
    """e.g. value=2, list_=[[1,2],[3,4]] -> 0
    because the value 2 is in the first (idx=0) inner list"""
    for idx, l in enumerate(list_):
        if value in l:
            return idx
    return -1

In the following, I am working with a dictionary-based solution. This makes it easier to check if edges already exist.

l3 = [[1, 2, 3], [4, 5]]
l4 = [[6, 7], [8, 9, 10]]
edges = [[1, 8, 'a'], [4, 9, 'b']]

new_edges = {}
for e in edges:
    # left
    l_idx = idx_group_in_list(e[0], l3)
    r_idx = idx_group_in_list(e[1], l4)
    if (l_idx, r_idx) in new_edges:
        pass    # two edges are squeezed. Maybe add some special stuff here
    new_edges[(l_idx, r_idx)] = e[2]

print(new_edges)
expected_output = {(0, 1): 'a', (1, 1): 'b'}
print(expected_output == new_edges)

Edit:

I've made some very simple performance tests. Most of the code stayed the same I've just changed the way, the lists are generated.

num_nodes_per_side = 1000
left = [[i] for i in range(num_nodes_per_side)]
right = [[i] for i in range(num_nodes_per_side, num_nodes_per_side*2)]
edges = [[i, j, 'a'] for i, j in zip(range(num_nodes_per_side), range(num_nodes_per_side, num_nodes_per_side*2))]

# result for num_nodes_per_side = 3
>>> left
[[0], [1], [2]]
>>> right
[[3], [4], [5]]
>>> edges
[[0, 3, 'a'], [1, 4, 'a'], [2, 5, 'a']]

This means you have from every left group one edge to a right group. In the following are my timeit results, based on num_nodes_per_side.

  • 10: 2.0693999999987778e-05
  • 100: 0.0004394410000000404
  • 1000: 0.042664883999999986
  • 10000: 4.629786907

Upvotes: 2

recnac
recnac

Reputation: 3744

To achieve a better performance, you can use dict to inverted index(make sure node id is unique). which improve the search time complexity from O(n) to O(1), but you need take the cost to rebuild the data structure. Here is a sample code:

d3 = {node : idx for idx, l in enumerate(l3) for node in l}
d4 = {node : idx for idx, l in enumerate(l4) for node in l}

for node1, node2, name in edges:
    if node1 in d3 and node2 in d4 or node2 in d3 and node1 in d4:
        print(node1, node2, name)

output:

1 8 a
4 9 b

If you want to check duplicate edge like @Uli Sotschok does, it is simliar:

new_edges = {}
for node1, node2, name in edges:
    if node1 in d3 and node2 in d4:
        l_idx = d3[node1]
        r_idx = d4[node2]

        if (l_idx, r_idx) not in new_edges:
            new_edges[(l_idx, r_idx)] = name

print(new_edges)
expected_output = {(0, 1): 'a', (1, 1): 'b'}
print(expected_output == new_edges)

output:

{(0, 1): 'a', (1, 1): 'b'}
True

Upvotes: 1

Related Questions