Reputation: 922
I'm pulling json data in from a large data file to convert the contents to csv format and I'm getting an error:
Traceback (most recent call last):
File "python/gamesTXTtoCSV.py", line 99, in <module>
writer.writerow(foo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 15: ordinal not in range(128)
After some digging I've found that, the string "\u2013" shows up in the json data file.
Example (see the value field):
"states":[
{
"display":null,
"name":"choiceText",
"type":"string",
"value":"Show me around \u2013 as long as your friends don't chase me away again!"
},
I've tried various methods of string replacement to the script to get rid of the offending string.
Stuff like (where i[value] is the offending field:
i['value'].replace("\\u2013", "--")
Or
i['value'].replace("\\", "") #this one is the last resort
Or even
i['value'].encode("utf8")
But to no avail - I keep getting the error. Any idea what's going on?
Here's the section of code that writes the csv, in case additional context is needed:
################## filling out the csv ################
openfile= open(inFile)
f = open(outFile, 'wt')
writer = csv.writer(f)
writer.writerow(all_cols)
for row in openfile.readlines():
line = json.loads(row)
stateCSVrow= []
states=line['states']
contexts=line['context']
contextCSVrow=[]
k = 0
for state in state_names:
for i in states:
if i['name']==state:
i['value'].replace("\u2019", "'") ####THE SECTION GIVING ISSUE
i['value'].replace("\u2013", "--")
stateCSVrow.append(i['value'])
if len(stateCSVrow)==k:
stateCSVrow.append('NA')
k +=1
c = 0
for context in context_names:
for i in contexts:
if i['name']==context:
contextCSVrow.append(i['value'])
if len(contextCSVrow)==c:
contextCSVrow.append('NA')
c +=1
first=[]
first.extend([
line['key'] ,
line['timestamp'],
line['actor']['actorType'],
line['user']['username'],
line['version'],
line['action']['name'],
line['action']['actionType']
])
foo = first + stateCSVrow + contextCSVrow
writer.writerow(foo)
Upvotes: 0
Views: 120
Reputation: 51807
You're trying to replace the repr of a unicode escape sequence, don't do that.
In [3]: x = 'fnord \u2034'
In [4]: x
Out[4]: 'fnord ‴'
In [5]: x.replace('\u2034', 'hi')
Out[5]: 'fnord hi'
(IPython with 3.5 on Arch Linux)
It works the same in Python2:
⚘ python2
Python 2.7.11 (default, Dec 6 2015, 15:43:46)
[GCC 5.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "Show me around \u2013 as long as your friends don't chase me away again!"
>>> x
"Show me around \\u2013 as long as your friends don't chase me away again!"
>>> x.replace('\u2013', '--')
"Show me around -- as long as your friends don't chase me away again!"
Upvotes: 1