How to create a graph using nx.read_edgelist if my csv file is present at S3?

Question

I have a csv file located at one of my S3 bucket (s3://abc/FB/train_woheader.csv). When I write..

g=nx.read_edgelist('s3://abc/FB/train_woheader.csv',delimiter=',',create_using=nx.DiGraph(),nodetype=int, encoding='utf-8')
print(nx.info(g))

it says

FileNotFoundError: [Errno 2] No such file or directory: 's3://abc/FB/train_woheader.csv'

However, if I save the csv in the Jupyter instance then I am able to create the graph using the line

g=nx.read_edgelist('train_woheader.csv',delimiter=',',create_using=nx.DiGraph(),nodetype=int, encoding='utf-8')

The csv is a heavy file and hence needs to be saved in S3 only. It can't be saved in Jupyter instance as its eats up a lot of space.

Any help on this?

balderman · Accepted Answer

read_edgelist is expecting to get a file or file name as argument.
What you can do is to read the file from s3 (using boto3), use StringIO and pass the populated file to read_edgelis:

import io.StringIO()
with io.StringIO() as f
    f.write('data_coming_from_s3_using_boto3')
    f.seek(0)
    g=nx.read_edgelist(f,delimiter=',',create_using=nx.DiGraph(),nodetype=int, encoding='utf-8')

How to create a graph using nx.read_edgelist if my csv file is present at S3?

Answers (1)

Related Questions