stdklr
stdklr

Reputation: 53

SQLAlchemy and filtering with numpy datatypes

The SQLAlchemy .filter() function doesn't seem to be able to work with numpy datatypes. If I use a np.int32 in the filter argument the desired result is not achieved. Instead, I need to cast my np.int32 to int to make a query work as expected.

In the following example I query values from the database, do some selection via numpy and query again with the reduced selection (I'm aware that this is not a meaningful operation, it's just to demonstrate the issue)

import numpy as np

from sqlalchemy import Column, Integer, Float, ForeignKey
from sqlalchemy.ext.declarative import declarative_base

# ORM
Base = declarative_base()

class Example(Base):
    __tablename__ = 'Example'

    UID = Column(Integer, primary_key=True, autoincrement=True)
    Value = Column(Float)


from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine('sqlite:///example.db', echo=False)
Session = sessionmaker(bind=engine)
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)

value = [1.1, 1.2, 2.1, 2.2]

session = Session()

for vi in value:
    example = Example(Value=vi)
    session.add(example)
    session.commit()


res = session.query(Example.UID, Example.Value).all()

uids = []
vals = []
for resi in res:
    uids.append(resi[0])
    vals.append(resi[1])

uids = np.array(uids) # creates a numpy array with elements of type np.int32
vals = np.array(vals)

idxsel = vals > 2
uids_sel = uids[idxsel]

for uidi in uids_sel:
    res = session.query(Example.Value).filter(Example.UID==uidi).all() # returns empty list
    print('without cast: {}'.format(res))
    res = session.query(Example.Value).filter(Example.UID==int(uidi)).all()
    print('with cast: {}'.format(res))

The result being

without cast: []
with cast: [(2.1,)] 
without cast: [] 
with cast: [(2.2,)]

I'm not sure, why the numpy datatype doesn't work. I never had any compatibility issues with numpy before. For me this is a rather serious issue. Of course you can cast, but since there is no warning or error this is really prone to serious trouble (accidentally missing cast) as an empty list is of course a valid response if your query doesn't match...

Since both numpy and sqlalchemy are rather widespread it would be great if they were compatible...

I really wonder that I found no mention of this problem anywhere via google. So maybe I'm missing something or doing something wrong. Any help is appreciated.

EDIT: The issue is with the sqlite3 API not with sqlalchemy. With the help of sqlalchemy we can however use TypeDecorators to work around the issue: https://github.com/sqlalchemy/sqlalchemy/issues/3586

EDIT: And here's a stackoverflow on how to deal with it without using sqlalchemy inserting numpy integer types into sqlite with python3

Upvotes: 1

Views: 1832

Answers (1)

Jake Stevens-Haas
Jake Stevens-Haas

Reputation: 1676

I spent a few hours running into the same underlying issue from a different angle (the where() clause). Although the above question was eventually edited with a mention of the TypeDecorator class as a solution, I wanted to implement that solution here (using sqlalchemy's Core API instead of ORM). Hopefully the expanded verbiage on the page improves the search result for this issue.

Similar MWE:

from pathlib import Path

import pandas as pd
from sqlalchemy import (
    create_engine,
    MetaData,
    Table,
    Column,
    Integer,
    String,
    select,
    insert,
)

eng = create_engine("sqlite:///"+str(Path().resolve() / 'temp.db'))
md = MetaData()
tbl = Table("users", md, Column("id", Integer), Column("name", String))
md.create_all(eng)
eng.execute(insert(tbl, (1,"Jack")))

stmt1 = select(tbl).where(tbl.c.id == 1)
print(eng.execute(stmt1).fetchall())

Returns as expected:

[(1, 'Jack')]

However:

target_id = pd.read_sql(stmt1, eng).loc[:,'id'].max()
stmt2 = select(tbl).where(tbl.c.id == target_id)
print(eng.execute(stmt2).fetchall())

returns:

[]

The selection fails to behave as expected because pandas.read_sql stores data as numpy types.

TypeDecorator solution

We can fix this with TypeDecorator

import sqlalchemy.types as types

class IntegerLike(types.TypeDecorator):

    impl = types.Integer
    cache_ok = True

    def process_bind_param(self, value, dialect):
        return int(value)

If we delete the temp.db we just created above and run the code with :

tbl = Table("users", md, Column("id", IntegerLike), Column("name", String))

Diagnostic Steps?

We can tell a little bit of what is going wrong by

print(stmt1)
SELECT users.id, users.name 
FROM users 
WHERE users.id = :id_1

The :id_1 is a bind parameter, and these are handled by the underlying DBAPI. (I don't know how to actually force the DBAPI to render the statement fully, so I'd welcome any comments or edits that provide a way)

Upvotes: 1

Related Questions