segmentationfault
segmentationfault

Reputation: 518

Why does count() return 0 when accessing a character field larger than 511 using pyodbc?

I was trying to count the number of occurrences of a substring within a character field of a database (PostgreSQL, pyodbc, Python 3.4.4, Windows 7). But when the character field was larger than 511, count() was always returning zero. Only after "doing something" with the returned string (e.g. accessing the string at index 0 or using print() to look at the string), count() was returning the expected value. There were no problems with strings of length 511 or less.

Furthermore the size of the (512 or larger) string seems to change, see the example below.

This problem seems to occur with PostgreSQL, SQLite and Oracle when using pyodbc. I wasn't able to reproduce it using psycopg2.

So, is it a bug in pyodbc? Or is it some kind of optimization and count() does have issues with it? (Other functions seem to be ok with it, for example len().)

The following Python script will reproduce this problem using either PostgreSQL, SQLite-ODBC or Oracle. It will create two tables, insert text and try to use the function count() on the returned data.

import pyodbc
import sys

#conn   = pyodbc.connect('driver={SQLite3 ODBC Driver}; server=localhost; database=D:\\test.db;')
#conn   = pyodbc.connect('DSN=test-oracle;uid=xx;pwd=xx')
conn   = pyodbc.connect('DSN=test-postgresql;uid=xx;pwd=xx')
cursor = conn.cursor()
with conn.cursor() as cursor:

    cursor.execute("create table testtable511 (txt char(511) default ' ' primary key not NULL);")
    cursor.execute("insert into testtable511 (txt) values ('"+511*"t"+"');")
    cursor.execute("create table testtable512 (txt char(512) default ' ' primary key not NULL);")
    cursor.execute("insert into testtable512 (txt) values ('"+512*"t"+"');")

    cursor.execute('select * from testtable511')
    data511  = cursor.fetchone()
    print('511')
    print(80*'#')
    # count is 511, size is 560
    print('counting t before "accessing" string of testtable511:     ', data511[0].count('t'))
    print('size of string before "accessing" string of testtable511: ', sys.getsizeof( data511[0] ))
    data511[0][0]
    # count is 511, size is 560
    print('counting t after "accessing" string of testtable511:      ',  data511[0].count('t'))
    print('size of string after "accessing" string of testtable511:  ',  sys.getsizeof( data511[0] ))
    print(80*'#')

    print()

    cursor.execute('select * from testtable512')
    data512 = cursor.fetchone()
    print('512')
    print(80*'#')
    # count is 0, size is 1106
    print('counting t before "accessing" string of testtable512:     ', data512[0].count('t'))
    print('size of string before "accessing" string of testtable512: ', sys.getsizeof( data512[0] ))
    data512[0][0]
    # count is 512, size is 593
    print('counting t after "accessing" string of testtable512:      ', data512[0].count('t'))
    print('size of string after "accessing" string of testtable512:  ', sys.getsizeof( data512[0] ))
    print(80*'#')

    cursor.execute("drop table testtable511;")
    cursor.execute("drop table testtable512;")

conn.close()

UPDATE: the problem was with pyodbc 3.0.10. Version 4.0.11 fixes the issue.

Upvotes: 2

Views: 119

Answers (1)

segmentationfault
segmentationfault

Reputation: 518

The problem is fixed in pyodbc 4.0.11.

So it seems this was a bug with earlier pyodbc versions (I was using 3.0.10).

Upvotes: 2

Related Questions