Sagar Jha
Sagar Jha

Reputation: 13

python csv reader + special characters

I am writing script to read a csv file and write the data in a graph using the pygraphml.

Issue is that the file first column has some data like this and I am not able to read them.

Master Muppet ™ joèl b Kýrie, eléison

This is my python script

import csv
import sys
from pygraphml import Graph
from pygraphml import GraphMLParser

#reload(sys)
#sys.setdefaultencoding("utf8")

data = []  # networkd data to write
g = Graph() # graph for networks

#Open File and retrive the target rows
with open(r"C:\Users\csvlabuser\Downloads\test.csv","r") as fp:
    reader = csv.reader(fp)
    unread_count = 2
    completed_list = []

    try:
        for rows in reader:
            if "tweeter_id" == rows[2]:  # skip and check the header
                print("tweeter_id column found")
                continue
            #if rows[2] not in completed_list:                    
            n = g.add_node(rows[2].encode("utf8"))
            completed_list.append(rows[2])
            n['username'] = rows[0].encode("utf8")
            n['userid'] = rows[1]
            if rows[3] != "NULL":   # edges exist only when there is retweets id
                g.add_edge_by_label(rows[2], rows[3])


            print unread_count
            unread_count +=1

    except:
        pass

fp.close()
print unread_count

g.show()
# Write the graph into graphml file format
parser = GraphMLParser()
parser.write(g, "myGraph.graphml")

Kindly let me know where is the issue.

Thanks in advance.

Upvotes: 0

Views: 3463

Answers (1)

ShadowRanger
ShadowRanger

Reputation: 155438

The Python 2 csv module cannot handle unicode input or input containing NUL bytes (see the note at the top of the module page). Since you're using print as a keyword rather than a function, I'm guessing you're using Python 2. To use csv with Unicode in Python 2, you must convert to UTF-8 encoding.

The csv module's Examples section contains definitions for wrappers (UTF8Recoder, UnicodeReader, UnicodeWriter) that allow you to parse inputs in arbitrary encodings, seamlessly fixing up encodings so csv can process the inputs, then decoding back to Python unicode objects (that represent the text as "pure" Unicode text, not a specific byte encoding).

Upvotes: 1

Related Questions