JoeM05
JoeM05

Reputation: 922

python backslash replacement failing

I'm pulling json data in from a large data file to convert the contents to csv format and I'm getting an error:

Traceback (most recent call last):
  File "python/gamesTXTtoCSV.py", line 99, in <module>
    writer.writerow(foo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 15: ordinal not in range(128)

After some digging I've found that, the string "\u2013" shows up in the json data file.

Example (see the value field):

"states":[
      {
         "display":null,
         "name":"choiceText",
         "type":"string",
         "value":"Show me around \u2013 as long as your friends don't chase me away again!"
      },

I've tried various methods of string replacement to the script to get rid of the offending string.

Stuff like (where i[value] is the offending field:

 i['value'].replace("\\u2013", "--")

Or

i['value'].replace("\\", "") #this one is the last resort

Or even

i['value'].encode("utf8")

But to no avail - I keep getting the error. Any idea what's going on?

Here's the section of code that writes the csv, in case additional context is needed:

################## filling out the csv ################
openfile= open(inFile)
f = open(outFile, 'wt')
writer = csv.writer(f)
writer.writerow(all_cols)

for row in openfile.readlines():
    line = json.loads(row)
    stateCSVrow= []
    states=line['states']
    contexts=line['context']
    contextCSVrow=[]
    k = 0
    for state in state_names:
        for i in states:
            if i['name']==state:
                i['value'].replace("\u2019", "'") ####THE SECTION GIVING ISSUE
                i['value'].replace("\u2013", "--")
                stateCSVrow.append(i['value'])
        if len(stateCSVrow)==k:
            stateCSVrow.append('NA')
        k +=1
    c = 0
    for context in context_names:
        for i in contexts:
            if i['name']==context:
                contextCSVrow.append(i['value'])
        if len(contextCSVrow)==c:
            contextCSVrow.append('NA')
        c +=1
    first=[]
    first.extend([
        line['key'] ,
        line['timestamp'],
        line['actor']['actorType'],
        line['user']['username'],
        line['version'],
        line['action']['name'],
        line['action']['actionType']
          ])

    foo = first + stateCSVrow + contextCSVrow
    writer.writerow(foo)

Upvotes: 0

Views: 120

Answers (1)

Wayne Werner
Wayne Werner

Reputation: 51807

You're trying to replace the repr of a unicode escape sequence, don't do that.

In [3]: x = 'fnord \u2034'

In [4]: x
Out[4]: 'fnord ‴'

In [5]: x.replace('\u2034', 'hi')
Out[5]: 'fnord hi'

(IPython with 3.5 on Arch Linux)

It works the same in Python2:

⚘ python2
Python 2.7.11 (default, Dec  6 2015, 15:43:46)
[GCC 5.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "Show me around \u2013 as long as your friends don't chase me away again!"
>>> x
"Show me around \\u2013 as long as your friends don't chase me away again!"
>>> x.replace('\u2013', '--')
"Show me around -- as long as your friends don't chase me away again!"

Upvotes: 1

Related Questions