mandia
mandia

Reputation: 71

Python: follow a "path" of tuples?

Short version: What I have: A list of 2-tuples, for example [("a", "b"), ("b", "c"), ("d", "e"), ("c", "d"), ("f", "g")] that are not necessarily alphabetically ordered

What I want: Giving a start letter (say "a") and an end letter (say "e") I'd like Python to find the list of available 2-tuples from the list above that would "link" the start letter to the end letter, so in this example, that list would be [("a", "b"), ("b", "c"), ("c", "d"), ("d", "e")] in that order (a --> b --> c --> d --> e)

Longer version: Hi all, this is my 1st post on SO, although I've been browsing it for ages and always found my answers here, great community!

I have some data analysis to do for my work, and I have a certain number of data sets (that I will represent by letters here for simplicity) of which I only know the mathematical difference: ("a" - "b"), ("b" - "c"), etc (these are my inputs). I will represent these inputs by 2-tuples. The idea is to calculate the difference between data sets "a" and "e", i.e. "a" - "e", which in this case can be obtained by summing some of the intermediate data sets differences (my inputs): ("a" - "b") + ("b" - "c") + ("c" - "d") + ("d" - "e") = "a" - "e".

I'd like to know if there is a Python module that does what I want already, or if there's a simple way to do this using Python's syntax. In the simple case above, each letter only appears on 2 tuples from the list, but in the general case, there could be an extra tuple containing the right letter but that doesn't allow to link the start letter to the end letter (e.g. if there was an extra tuple ("b", "h"), it would be found on the 1st iteration of the code, along with the tuple ("b", "c"), but it should be discarded because the letter "h" doesn't "lead" anywhere). I'm having trouble dealing with such cases.

I hope the question is clear enough, it's difficult to express in simple terms.

Thanks in advance!

Upvotes: 4

Views: 481

Answers (1)

yatu
yatu

Reputation: 88305

Looks like the way to go here is using some graph analysis tool to find the shortest path between a pair of nodes. Though this case is actually somewhat of a simplification of the problem, since you mention that each letter only appears on 2 tuples from the list, meaning that there will only be one path connecting a pair of nodes. Though the common scenario is to have multiple possible paths connecting a source and target nodes, in which case we need some algorithm to find the shortest among these.

So a way to go about this is using NetworkX to build a graph, having the list of tuples represent the edges of the graph, and look for the nx.shortest_path between a pair of source and target nodes:

import networkx as nx

edges = [("a", "b"), ("b", "c"), ("d", "e"), ("c", "d"), ("f", "g")]

G = nx.from_edgelist(edges)
path_nodes = nx.shortest_path(G, 'a', 'e')
# ['a', 'b', 'c', 'd', 'e']

If you want the output as a list of tuples, you could easily do:

list(zip(path_nodes[:-1], path_nodes[1:]))
# [('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]

Note that order is not a relevant factor here, sense this just basically defines a graph from the provided edges, and shortest_path will just seek for the minimum necessary graph edges to connect a source and target nodes.

Upvotes: 5

Related Questions