Reputation: 510
I have a pandas dataframe as shown below. There are many more columns in that frame that are not important concerning the task. The column id
shows the sentenceID while the columns e1
and e2
contain entities (=words) of the sentence with their relationship in the column r
id e1 e2 r
10 a-5 b-17 A
10 b-17 a-5 N
17 c-1 a-23 N
17 a-23 c-1 N
17 d-30 g-2 N
17 g-20 d-30 B
I also created a graph for each sentence. The graph is created from a list of edges that looks somewhat like this
[('wordB-5', 'wordA-1'), ('wordC-8', 'wordA-1'), ...]
All of those edges are in one list (of lists). Each element in that list contains all the edges of each sentence. Meaning list[0]
has the edges of sentence 0 and so on.
Now I want to perform operations like these:
graph = nx.Graph(graph_edges[i])
shortest_path = nx.shortest_path(graph, source="e1",
target="e2")
result_length = len(shortest_path)
result_path = shortest_path
For each row in the data frame, I'd like to calculate the shortest paths (from the entity in e1
to the entity in e2
and save all of the results in a new column in the DataFrame but I have no idea how to do that.
I tried using constructions such as these
e1 = DF["e1"].tolist()
e2 = DF["e2"].tolist()
for id in Df["sentenceID"]:
graph = nx.Graph(graph_edges[id])
shortest_path = nx.shortest_path(graph,source=e1, target=e2)
result_length = len(shortest_path)
result_path = shortest_path
to create the data but it says the target is not in the graph.
new df=
id e1 e2 r length path
10 a-5 b-17 A 4 ..
10 b-17 a-5 N 4 ..
17 c-1 a-23 N 3 ..
17 a-23 c-1 N 3 ..
17 d-30 g-2 N 7 ..
17 g-20 d-30 B 7 ..
Upvotes: 3
Views: 1851
Reputation: 510
For anyone that's interested in the solution (thanks to Ram Narasimhan) :
pathlist, len_list = [], []
so, tar = DF["e1"].tolist(), DF["e2"].tolist()
id = DF["id"].tolist()
for _,s,t in zip(id, so, tar):
graph = nx.Graph(graph_edges[_]) #Constructing each Graph
try:
path = nx.shortest_path(graph, source=s, target=t)
length = nx.shortest_path_length(graph,source=s, target=t)
pathlist.append(path)
len_list.append(length)
except nx.NetworkXNoPath:
path = "No Path"
length = "No Pathlength"
pathlist.append(path)
len_list.append(length)
#Add these lists as new columns in the DF
DF['length'] = len_list
DF['path'] = pathlist
Upvotes: 1
Reputation: 22496
Here's one way to do what you are trying to do, in three distinct steps so that it is easier to follow along.
networkx
graph object.import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
elist = [[('a-5', 'b-17'), ('b-17', 'c-1')], #sentence 1
[('c-1', 'a-23'), ('a-23', 'c-1')], #sentence 2
[('b-17', 'g-2'), ('g-20', 'c-1')]] #sentence 3
graph = nx.Graph()
for sentence_edges in elist:
for fromnode, tonode in sentence_edges:
graph.add_edge(fromnode, tonode)
nx.draw(graph, with_labels=True, node_color='lightblue')
#Create a data frame to store distances from the element in column e1 to e2
DF = pd.DataFrame({"e1":['c-1', 'a-23', 'c-1', 'g-2'],
"e2":['b-17', 'a-5', 'g-20', 'g-20']})
DF
This is the final step. Calculate shortest paths and store them.
pathlist, len_list = [], [] #placeholders
for row in DF.itertuples():
so, tar = row[1], row[2]
path = nx.shortest_path(graph, source=so, target=tar)
length=nx.shortest_path_length(graph,source=so, target=tar)
pathlist.append(path)
len_list.append(length)
#Add these lists as new columns in the DF
DF['length'] = len_list
DF['path'] = pathlist
Which produces the desired resulting data frame:
Hope this helps you.
Upvotes: 3