Reading unicode from sqlite db using python

Question

The data stored in unicode (in database) has to be retrieved and convert into a different form.

The following snippet

def convert(content):
    content = content.replace("ஜௌ", "n\[s");
    return content;

mydatabase = "database.db"
connection = sqlite3.connect(mydatabase)
cursor = connection.cursor()
query = ''' select unicode_data from table1'''
cursor.execute(query)
for row in cursor.fetchone():
    print convert(row)

yields the following error message in convert method.

exceptions.UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 0: ordinal not in range(128)

If the database content is "ஜௌஜௌஜௌ", the output should be "n\[sn\[sn\[s"

The documentation suggests to use ignore or replace to avoid the error, when creating the unicode string.

when the iteration is changed as follows:

for row in cursor.fetchone():
    print convert(unicode(row, errors='replace'))

it returns

exceptions.TypeError: decoding Unicode is not supported

which informs that row is already a unicode.

Any light on this to make it work is highly appreciated. Thanks in advance.

bobince · Accepted Answer

content = content.replace("ஜௌ", "n\[s");

Suggest you mean:

content = content.replace(u'ஜௌ', ur'n\[s');

or for safety where the encoding of your file is uncertain:

content = content.replace(u'\u0B9C\u0BCC', ur'n\[s');

The content you have is already Unicode, so you should do Unicode string replacements on it. "ஜௌ" without the u is a string of bytes that represents those characters in some encoding dependent on your source file charset. (Byte strings work smoothly together with Unicode strings only in the most unambiguous cases, which is for ASCII characters.)

(The r-string means not having to worry about including bare backslashes.)

Reading unicode from sqlite db using python

Answers (1)

Related Questions