Reputation: 324
I have this list:
ls = [[0, 'C', [1, 2, 3, 4], 'E', []],
[1, 'C', [0, 5, 6, 7], 'E', []],
[2, 'H', [0], '-', []],
[3, 'H', [0], '-', []],
[4, 'H', [0], '-', []],
[5, 'H', [1], '-', []],
[6, 'O', [1], 'X', []],
[7, 'H', [1], '-', []]]
This represents a molecule. The first column is just a number, the second column is an atom in the molecule, the third column tells which atoms this atom is bound to. For example, atom 0 is bound to atoms 1, 2, 3 and 4. I want to find out how far away each atom is from the oxygen, and this is info is stored to the last column. So the output should be:
[[0, 'C', [1, 2, 3, 4], 'E', 2],
[1, 'C', [0, 5, 6, 7], 'E', 1],
[2, 'H', [0], '-', 3],
[3, 'H', [0], '-', 3],
[4, 'H', [0], '-', 3],
[5, 'H', [1], '-', 2],
[6, 'O', [1], 'X', 0],
[7, 'H', [1], '-', 2]]
I tried this loop and it works just fine:
def dists(data):
new_data = data
# Loop to find the distances from the X-atom:
for sl2 in new_data:
if sl2[3] == "X":
sl2[4] = 0
next1 = sl2[2]
for number1, row1 in enumerate(new_data):
for a1 in next1:
if new_data[number1][0] == a1:
if type(new_data[a1][4]) == list:
new_data[a1][4] = 1
next2 = new_data[a1][2]
for number2, row2 in enumerate(new_data):
for a2 in next2:
if new_data[number2][0] == a2:
if type(new_data[a2][4]) == list:
new_data[a2][4] = 2
next3 = new_data[a2][2]
for number3, row3 in enumerate(new_data):
for a3 in next3:
if new_data[number3][0] == a3:
if type(new_data[a3][4]) == list:
new_data[a3][4] = 3
next4 = new_data[a3][2]
#etc...
return new_data
ls2 = dists(ls)
But now I have to make many nested for loops. How can I turn these nested loops into a single loop?
EDIT for @deadshot
The value of the fifth column comes from these:
sl2[4] = 0
new_data[a1][4] = 1
new_data[a2][4] = 2
new_data[a3][4] = 3
etc...
In the fourth column the "X" just helps me to find the starting point which is the oxygen atom.
Upvotes: 3
Views: 82
Reputation: 71471
Shorter recursive solution:
ls = [[0, 'C', [1, 2, 3, 4], 'E', []], [1, 'C', [0, 5, 6, 7], 'E', []], [2, 'H', [0], '-', []], [3, 'H', [0], '-', []], [4, 'H', [0], '-', []], [5, 'H', [1], '-', []], [6, 'O', [1], 'X', []], [7, 'H', [1], '-', []]]
d_ls = {a:b for a, *b in ls} #convert ls to a dictionary for faster lookup
def to_node(target, nodes, c = []):
if any(d_ls[i][0] == target for i in nodes):
yield len(c)+1
else:
for i in filter(lambda x:x not in c, nodes):
yield from to_node(target, d_ls[i][1], c+[i])
r = [[*a, 0 if a[1] == 'O' else next(to_node('O', a[2]), [a[0]])] for *a, _ in ls]
Output:
[[0, 'C', [1, 2, 3, 4], 'E', 2],
[1, 'C', [0, 5, 6, 7], 'E', 1],
[2, 'H', [0], '-', 3],
[3, 'H', [0], '-', 3],
[4, 'H', [0], '-', 3],
[5, 'H', [1], '-', 2],
[6, 'O', [1], 'X', 0],
[7, 'H', [1], '-', 2]]
Upvotes: 0
Reputation: 6642
Since this is a graph problem, as Christian Sloper mentioned, you could also use networkx:
import networkx as nx
ls = [[0, 'C', [1, 2, 3, 4], 'E', []],
[1, 'C', [0, 5, 6, 7], 'E', []],
[2, 'H', [0], '-', []],
[3, 'H', [0], '-', []],
[4, 'H', [0], '-', []],
[5, 'H', [1], '-', []],
[6, 'O', [1], 'X', []],
[7, 'H', [1], '-', []]]
# identify atom marked as X
x_index = next(filter(lambda x: x[3] == "X", ls))[0]
G = nx.Graph()
G.add_nodes_from([atom[0] for atom in ls])
G.add_edges_from(set((a1[0], a2) for a1 in ls for a2 in a1[2]))
map_atom_dist = nx.single_source_shortest_path_length(G, x_index)
# map_atom_dist: {6: 0, 1: 1, 0: 2, 5: 2, 7: 2, 2: 3, 3: 3, 4: 3}
[atom[:-1] + [map_atom_dist[atom[0]]] for atom in ls]
# Output:
[[0, 'C', [1, 2, 3, 4], 'E', 2],
[1, 'C', [0, 5, 6, 7], 'E', 1],
[2, 'H', [0], '-', 3],
[3, 'H', [0], '-', 3],
[4, 'H', [0], '-', 3],
[5, 'H', [1], '-', 2],
[6, 'O', [1], 'X', 0],
[7, 'H', [1], '-', 2]]
Upvotes: 1
Reputation: 7510
In computer science terms, what you have here is what is known as a graph. Your molecules are what known as "nodes" or "vertices" and your connections between them is known as "edges". You need to find the distance between the oxygen and all other nodes. This can be done with what is called a Breadth first search (there are other methods, but i think this is the easiest one to start with)
I strongly suggest you read the wikipedia page on this, but here is a python version adapted to your data structure:
from collections import deque
def bfs(ls):
root = [a for a,b, *(_) in ls if b == 'O' ][0]
ls[root][4] = 0 # mark oxygen as distance 0
queue = deque([(root,0)])
discovered = set([root])
while queue:
v,d = queue.popleft()
for edge in ls[v][2]:
if edge not in discovered:
discovered.add(edge)
queue.append((edge,d+1))
ls[edge][4] = d+1 # add distance to new atom
runs like this:
bfs(ls)
ls
after run:
[[0, 'C', [1, 2, 3, 4], 'E', 2],
[1, 'C', [0, 5, 6, 7], 'E', 1],
[2, 'H', [0], '-', 3],
[3, 'H', [0], '-', 3],
[4, 'H', [0], '-', 3],
[5, 'H', [1], '-', 2],
[6, 'O', [1], 'X', 0],
[7, 'H', [1], '-', 2]]
End note: You can leverage your data structure to avoid using discovered as as separate variable, but I have included it here to more conform to the pseudo code on the wiki page. This way it is easier to compare the python code and the theory.
Upvotes: 2