Tilo
Tilo

Reputation: 429

how to create graph from edge list using GraphFrame

I have dataframe with two columns which are edge list and I want to create graph from it using pyspark or python Can anyone suggest how to do it. In R it can be done using below command from igraph

graph.edgelist(as.matrix(df))

my input dataframe is df

    valx      valy 
1: 600060     09283744
2: 600131     96733110 
3: 600194     01700001

My output should look like below (its basically all valx and valy under V1 and their membership info under V2)

V1               V2
600060           1
96733110         1
01700001         2

Upvotes: 0

Views: 1553

Answers (1)

Ankur
Ankur

Reputation: 320

By your desired output, you don't seem to want a graph but rather an array that shows which row your V1 value was originally stored in. Which you can get from your original dataframe.

I'm going to assume that what you want is to turn the dataframe in a graph format and not the above.

import networkx as nx
import pandas as pd

filelocation = r'C:\Users\Documents\Tilo Edgelist'

Panda_edgelist = pd.read_csv(filelocation)

g = nx.from_pandas_edgelist(Panda_edgelist,'valx','valy')

nx.draw(g,with_labels = True,node_size = 0)

The above code will create a graph for you in python, below is what the output looks like if you draw the graph using the draw function from networkx.

Graph output to console

I've gone ahead and assumed that you're creating the dataframe by reading in some sort of file.

If you can covert this file into a csv file, then you can read it in to a dataframe with pandas.

Format for the csv file I used is as follows:

valx,valy

600060,09283744

600131,96733110 

600194,01700001

substitute the filepath between the quotation marks for the filepath to your csv file.

below you can see the what the dataframe from pd.read_csv looks like

   valx      valy
0  600060   9283744
1  600131  96733110
2  600194   1700001

So then we pass this dataframe to networkx to create the graph

g = nx.from_pandas_edgelist(Panda_edgelist,'valx','valy')

In the function above, you can see I've given it the argument Panda_edgelist and then 'valx' and 'valy' as the source and target node column names, respectively. It uses these arguments to create a graph called g.

Finally, I've drawn the graph generated to console using nx.draw.

 nx.draw(g,with_labels = True,node_size = 0)

This function needs you to pass it the graph, g in our case.

with_labels = True is used to draw the node names/ID.

node_size = 0 is used to make the size of the node drawn 0. By default, if you don't give the function this argument then it will draw small red circles to represent the nodes in the graph.

Upvotes: 1

Related Questions