Reputation: 32296
I have a dataframe that looks like graph database.
import pandas as pd
mycols=['china', 'england', 'france', 'india', 'pakistan', 'taiwan']
df=pd.DataFrame([[0, 0, 0, 3, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 0],
[3, 1, 1, 0, 1, 0],
[0, 0, 0, 1, 0, 4],
[0, 0, 0, 0, 4, 0]], columns=mycols)
df.index=mycols
The simplified dummy dataframe looks like this:
china england france india pakistan taiwan
china 0 0 0 3 0 0
england 0 0 1 1 0 0
france 0 1 0 1 0 0
india 3 1 1 0 1 0
pakistan 0 0 0 1 0 4
taiwan 0 0 0 0 4 0
Let's assume a user want to go from china to india, there is direct route.
df[df['china'] > 0].index.str.contains('india')
array([ True])
But no direct route to england:
df[df['china'] > 0].index.str.contains('england')
array([False])
In that case, I need to find the common country:
set(df[df.loc['china'] > 0].index.values) & set(df[df.loc['england'] > 0].index.values)
{'india'}
But there are cases where there is no common friend, and I need to find friend of friend to reach the destination. for e.g.
set(df[df.loc['china'] > 0].index.values) & set(df[df.loc['taiwan'] > 0].index.values)
1) In this case how do I write a query that will return china - india - pakistan - taiwan ?
2) Is there any better way to store this? Or SQL like (rows / columns) is ok?
Upvotes: 0
Views: 133
Reputation: 8801
You can do this using Networkx in the following way
Load the graph
import pandas as pd
import networkx as nx
mycols=['china', 'england', 'france', 'india', 'pakistan', 'taiwan']
df=pd.DataFrame([[0, 0, 0, 3, 0, 0],
[0, 0, 1, 1, 0, 0],
[0, 1, 0, 1, 0, 0],
[3, 1, 1, 0, 1, 0],
[0, 0, 0, 1, 0, 4],
[0, 0, 0, 0, 4, 0]], columns=mycols)
#Load the graph from dataframe
G = nx.from_numpy_matrix(df.values)
#set the nodes names
G = nx.relabel_nodes(graph, dict(enumerate(mycols)))
Test if the graph is correctly loaded
print G.edges()
#EdgeView([('pakistan', 'taiwan'), ('pakistan', 'india'), ('england', 'india'), ('england', 'france'), ('india', 'china'), ('india', 'france')])
print graph['china']
#AtlasView({'india': {'weight': 3}})
print graph['england']
#AtlasView({'india': {'weight': 1}, 'france': {'weight': 1}})
Now suppose you need to find all path from china
to india
for path in nx.all_simple_paths(graph, source='china', target='taiwan'):
print path
#Output : ['china', 'india', 'pakistan', 'taiwan']
If you want to find shortest paths from one node to another
for path in nx.all_shortest_paths(graph, source='taiwan', target='india'):
print path
#Output : ['taiwan', 'pakistan', 'india']
You can find multiple other algorithms for finding the shortext path, all-pair shortest path, dijsktra algorithm, etc. at their documentation to suit your queries
Note there might exist a way to load the graph directly from pandas using from_pandas_dataframe, but I was not sure if the use case was correct, as it requires a source and target
Upvotes: 3
Reputation: 5794
Your problem (I am assuming) is basically to find the shortest path between any two given nodes in a weighted graph. Algorithmically speaking, this is called Shortest path problem (or more precisely single-pair shortest path problem). Networkx 2.1 has a function shortest_path
for exactly doing this
From their example,
G = nx.path_graph(5)
>>> print(nx.shortest_path(G, source=0, target=4))
[0, 1, 2, 3, 4]
If the source and target are both specified, return a single list of nodes in a shortest path from the source to the target.
If you want to get the get the shortest path to all the nodes from a source, just skip the target
node (essentially making it a single-source shortest path problem)
Upvotes: 1