Choice
Choice

Reputation: 3

Encoding problems in python x64

i´m trying to write a little script for writting a sqlite table from an archive list saved in a file. the code so far is this:

import os import _sqlite3 import sys

print sys.path[0] mydir = sys.path[0] print (mydir) def listdir(mydir):
    lis=[] 
    for root, dirs, files in os.walk(mydir):
         for name in files:
             lis.append(os.path.join(root,name))
    return lis
     filename = "list.txt" print ("writting in %s" % filename) file = open(filename, 'w' ) for i in listdir(mydir):
    file.write(i)
    file.write("\n") file.close()

con =
_sqlite3.connect("%s/conection"%mydir) c=con.cursor()

c.execute(''' drop table files ''') c.execute('create table files (name text, other text)') file = open(filename,'r') for line in file :
    a = 1
    for t in [("%s"%line, "%i"%a)]:
        c.execute('insert into files values(?,?)',t)
        a=a+1 c.execute('select * from files') print c.fetchall() con.commit() c.close()

when i run i get the following:

Traceback (most recent call last):   File "C:\Users\josh\FORGE.py", line 32, in <module>
    c.execute('insert into files values(?,?)',t) ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

i´ve tried with the unicode() built in function but still won´t work, saying that he can´t decode the character 0xed or something.

I know the problem is on the encoding of the list strings, but i can´t find a way to put them right. any ideas? thanks in advance!

Upvotes: 0

Views: 225

Answers (1)

knitti
knitti

Reputation: 7033

(zero). please reformat your code

  1. after for line in file: do something like line = line.decode('encoding-of-the-file'), with encoding being something like utf-8, or iso-8859-1 -- you have to know your input encoding

    If you don't know the encoding or not care about having a clean decoding, you can guess the most probable encoding and do a line.decode('uft-8', 'ignore'), omitting all characters not decodable. Also, you can use 'replace', which replaces these chars with the 'Unicode Replacement Character' (\ufffd)

  2. use internally and during communication with the database only unicodeobjects, e.g. u'this is unicode'

(3). Don't use file as variable name

also look here: Best Practices for Python UnicodeDecodeError

Upvotes: 1

Related Questions