Reputation: 13
I am writing script to read a csv file and write the data in a graph using the pygraphml.
Issue is that the file first column has some data like this and I am not able to read them.
Master Muppet ™ joèl b Kýrie, eléison
This is my python script
import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser
#reload(sys)
#sys.setdefaultencoding("utf8")
data = [] # networkd data to write
g = Graph() # graph for networks
#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
reader = csv.reader(fp)
unread_count = 2
completed_list = []
try:
for rows in reader:
if "tweeter_id" == rows[2]: # skip and check the header
print("tweeter_id column found")
continue
#if rows[2] not in completed_list:
n = g.add_node(rows[2].encode("utf8"))
completed_list.append(rows[2])
n['username'] = rows[0].encode("utf8")
n['userid'] = rows[1]
if rows[3] != "NULL": # edges exist only when there is retweets id
g.add_edge_by_label(rows[2], rows[3])
print unread_count
unread_count +=1
except:
pass
fp.close()
print unread_count
g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")
Kindly let me know where is the issue.
Thanks in advance.
Upvotes: 0
Views: 3463
Reputation: 155438
The Python 2 csv
module cannot handle unicode
input or input containing NUL
bytes (see the note at the top of the module page). Since you're using print
as a keyword rather than a function, I'm guessing you're using Python 2. To use csv
with Unicode in Python 2, you must convert to UTF-8
encoding.
The csv
module's Examples section contains definitions for wrappers (UTF8Recoder
, UnicodeReader
, UnicodeWriter
) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv
can process the inputs, then decoding back to Python unicode
objects (that represent the text as "pure" Unicode text, not a specific byte encoding).
Upvotes: 1