M. Vlad
M. Vlad

Reputation: 137

How to fix: "sqlite3.OperationalError: Could not decode to UTF-8"

Using the Python sqlite3 package I'm selecting some rows from an sqlite database (with a triple inner join) in order to put them in a pandas dataframe at a later stage.

I've tried running the following code to select from the database:

import pandas as pd
import sqlite3

# connect to the database
conn = sqlite3.connect("C:\Quant\Data_Storage2\sqlite\db\scorecard.db")
conn.text_factory = str
c = conn.cursor()

# create pandas df with data from the sqlite db
sql = '''SELECT MACRO_SA.SERIES_ID, MACRO_SA.COUNTRY_ID, MACRO_SA.DATA, 
         MACRO_SA.REL_DATE, HF_DATA.DATA
         FROM MACRO_SA
            INNER JOIN MAP_SERIES ON MAP_SERIES.CUR = MACRO_SA.COUNTRY_ID
            INNER JOIN HF_DATA ON MAP_SERIES.CUR = MACRO_SA.COUNTRY_ID
         WHERE (MAP_SERIES.CRNCY NOT NULL)  
            AND (HF_DATA.SERIES_ID = 'bg_cur')
'''
df_to_adj = c.execute(sql).fetchall()
conn.commit()
df_to_adj = pd.DataFrame(df_to_adj, columns=['SERIES_ID', 'COUNTRY_ID', 'DATA', 'REL_DATE', 'FX_RATE'])

I get the following error: "sqlite3.OperationalError: Could not decode to UTF-8". Any help is much appreciated.

Upvotes: 0

Views: 3284

Answers (2)

JoS
JoS

Reputation: 86

I had a similar problem. The text in the database seemed to be encoded in another encoding than utf-8. To find out the which encodings worked instead I used a helper function to try out all encodings:

from encodings.aliases import aliases
_encodings = set(aliases.values())

def try_encodings(byte_text: bytes):
    for encoding in _encodings:
        try:
            print(f'Encoding {encoding}: {byte_text.decode(encoding)}')
        except (UnicodeDecodeError, LookupError):
            pass

Set text_factory to bytes and iterate row per row and decode the problem column. If the row with the decoding problem is found, try out all other encodings:

c = conn.cursor()
conn.text_factory = bytes
sql = 'Select * from ...'
c.execute(sql)

col_ix = 0  # Set column index of problem column
while row := c.fetchone():
    try:
        text = row[col_ix].decode()
    except UnicodeDecodeError as e:
        print('Cannot decode byte string: ', row[col_ix])
        try_encodings(row[col_ix])
        break

If you found the correct encoding, you can set it to text_factory. For example cp852:

conn.text_factory = lambda x: x.decode('cp852')

Upvotes: 2

Markus Stenberg
Markus Stenberg

Reputation: 11

conn.text_factory = str 

works with Python 2; if you're using Python 3, you have to provide your own magic function which takes in bytes, and spits out whatever you want the string to be. One option is trying to decode the string, and if it fails, return the bytes raw. If you can always handle bytes, just doing

conn.text_factory = bytes

is enough. See https://docs.python.org/3.7/library/sqlite3.html for the gory details.

Upvotes: 0

Related Questions