Reputation: 1759

Sqlite3 cannot correctly query a UTF-8 string?

I'm having a lot of trouble using python's sqlite3 library with UTF-8 strings. I need this encoding because I am working people's names, in my database.

My SQL schema for the desired table is:

CREATE TABLE senators (id integer, name char);

I would like to do the following in Python (ignore the very ugly way I wrote the select statement. I did it this way for debugging purposes):

statement = u"select * from senators where name like '" + '%'+row[0]+'%'+"'"
c.execute(statement)

row[0] is the name of each row in a file that has this type of entry:

Dário Berger,1
Edison Lobão,1
Eduardo Braga,1

While I have a non empty result for names like Eduardo Braga, any time my string has UTF-8 characters, I get a null result.

I have checked that my file has in fact been saved with UTF-8 encoding (Microsoft Notepad). On a Apple mac, in the terminal, I used the PRAGMA command in the sqlite3 shell to check the encoding:

sqlite> PRAGMA encoding;
UTF-8

Does anybody have an idea what I can do here?

EDIT - Complete example: Python script that creates the databases, and populates with initial data from senators.csv (file):

# -*- coding: utf-8 -*-
import sqlite3 
import csv

conn = sqlite3.connect('senators.db')
c = conn.cursor()
c.execute('''CREATE TABLE senators (id integer, name char)''')
c.execute('''CREATE TABLE polls (id integer, senator char, vote integer, FOREIGN KEY(senator) REFERENCES senators(name))''')

with open('senators.csv', encoding='utf-8') as f:
    f_csv = csv.reader(f)
    for row in f_csv:
        c.execute(u"INSERT INTO senators VALUES(?,?)", (row[1], row[0]))

conn.commit()
conn.close()

Script that populates the polls table, using Q1.txt (file).

import csv
import sqlite3
import re
import glob

conn = sqlite3.connect('senators.db')
c = conn.cursor()

POLLS = {
    'senator': 'votes/senator/Q*.txt',
    'deputee': 'votes/deputee/Q*.txt',
}

s_polls = glob.glob(POLLS['senator'])
d_polls = glob.glob(POLLS['deputee'])

for poll in s_polls:
    m = re.match('.*Q(\d+)\.txt', poll)
    poll_id = m.groups(0)

    with open(poll, encoding='utf-8') as p:
        f_csv = csv.reader(p)
        for row in f_csv:
            c.execute(u'SELECT id FROM senators WHERE name LIKE ?', ('%'+row[0]+'%',))
            data = c.fetchone()
            print(data) # I should not get None results here, but I do, exactly when the query has UTF-8 characters.

Note the file paths, if you want to test these scripts out.

Upvotes: 3

Sqlite3 cannot correctly query a UTF-8 string?

Answers (2)

Related Questions