Reputation: 429
I have dataframe with two columns which are edge list and I want to create graph from it using pyspark or python Can anyone suggest how to do it. In R it can be done using below command from igraph
graph.edgelist(as.matrix(df))
my input dataframe is df
valx valy
1: 600060 09283744
2: 600131 96733110
3: 600194 01700001
My output should look like below (its basically all valx and valy under V1 and their membership info under V2)
V1 V2
600060 1
96733110 1
01700001 2
Upvotes: 0
Views: 1553
Reputation: 320
By your desired output, you don't seem to want a graph but rather an array that shows which row your V1 value was originally stored in. Which you can get from your original dataframe.
I'm going to assume that what you want is to turn the dataframe in a graph format and not the above.
import networkx as nx
import pandas as pd
filelocation = r'C:\Users\Documents\Tilo Edgelist'
Panda_edgelist = pd.read_csv(filelocation)
g = nx.from_pandas_edgelist(Panda_edgelist,'valx','valy')
nx.draw(g,with_labels = True,node_size = 0)
The above code will create a graph for you in python, below is what the output looks like if you draw the graph using the draw function from networkx.
I've gone ahead and assumed that you're creating the dataframe by reading in some sort of file.
If you can covert this file into a csv file, then you can read it in to a dataframe with pandas.
Format for the csv file I used is as follows:
valx,valy
600060,09283744
600131,96733110
600194,01700001
substitute the filepath between the quotation marks for the filepath to your csv file.
below you can see the what the dataframe from pd.read_csv
looks like
valx valy
0 600060 9283744
1 600131 96733110
2 600194 1700001
So then we pass this dataframe to networkx to create the graph
g = nx.from_pandas_edgelist(Panda_edgelist,'valx','valy')
In the function above, you can see I've given it the argument Panda_edgelist
and then 'valx'
and 'valy'
as the source and target node column names, respectively. It uses these arguments to create a graph called g.
Finally, I've drawn the graph generated to console using nx.draw
.
nx.draw(g,with_labels = True,node_size = 0)
This function needs you to pass it the graph, g in our case.
with_labels = True
is used to draw the node names/ID.
node_size = 0
is used to make the size of the node drawn 0. By default, if you don't give the function this argument then it will draw small red circles to represent the nodes in the graph.
Upvotes: 1