Smith
Smith

Reputation: 21

Unicode utf-16/8 mess with pyodbc

I am using pyodbc (I am afraid I don't know what version) in combination with MS Access and Python 3.2.5.

ODBC Version is unixODBC 2.2.14, mdb driver msodbcsql-11.0.2270 on a RHEL6 system

Now the big issue is that if I try and query my table, it returns the values in gibberish.

Example, SELECT * FROM table returns:

('ä¹ã', 'ã¶ã°ã°ã°ãµã°')

Now this by itself is not a big problem, I can just encode it using Python strings own encode method.

def _enc16(self, value):
        return value.encode('utf-16')[2::]

Although I'm telling the ODBC driver that I want to use utf-16 (I thought this was python standard?)

Excerpt from my odbc.ini:

[mdb]
Description = Microsoft Access
Driver = MDBToolsODBC
Database = /path/to/file.mdb
Servername = localhost
Charset = UTF16
PORT = 5432

Now this is where the problems start. If I try and query the table using WHERE clauses, it most likely cannot compare my Python string with the table content, because they're encoded differently.

So SELECT * FROM table WHERE id = '1' returns None, even though there is a corresponding record (I copied the statement into Access and the record is found).

So what is the best approach here? Configure ODBC so that it returns correct encoding (how? I fiddle around with this already for a while) or encode my search criteria (turning '1' into 'ä¹ã', but how?)?

Thanks for your help!

Upvotes: 1

Views: 690

Answers (1)

Smith
Smith

Reputation: 21

Ok, so I solved it using the SELECT * FROM table statement and feeding the results into a list of lists. I did a simple search over this list to filter out the unique record I'm looking for. I know this is a bit tedious but it works.

rows = [[b'' if value is None else self._enc16(value) for value in row] for row in cursor.fetchall()]

for row in rows:
  for i, entry in enumerate(row):
    row[i] = str(self._dec8(entry))
  if row[0] == str(samplenr):
    break

As you can see, I solved the encoding issue by encoding the scrambled results with UTF-16 and reencoding them to UTF-8 so Python can turn it into a readable string.

Upvotes: 1

Related Questions