geudrik
geudrik

Reputation: 672

Python-Sqlalchemy Binary Column Type HEX() and UNHEX()

I'm attempting to learn Sqlalchemy and utilize an ORM. One of my columns stores file hashes as binary. In SQL, the select would simply be

SELECT type, column FROM table WHERE hash = UNHEX('somehash')

How do I achieve a select like this (ideally with an insert example, too) using my ORM? I've begun reading about column overrides, but I'm confused/not certain that that's really what I'm after.

eg res = session.query.filter(Model.hash == __something__? )

Thoughts?

Upvotes: 4

Views: 9807

Answers (3)

TrilceAC
TrilceAC

Reputation: 323

I really like iuridiniz approach A better approach: Custom column that converts data by using sql functions, but I had some trouble making it work when using BINARY and VARBINARY to store hex strings in MySQL 5.7. I tried different things, but SQLAlchemy kept complaining about the encoding, and/or the use of func.HEX and func.UNHEX in contexts where they couldn't be used. Using python3 and SQLAlchemy 1.2.8, I managed to make it work extending the base class and replacing its processors, so that sqlalchemy does not require a function from the database to bind the data and compute the result, but rather it is done within python, as follows:

import codecs
from sqlalchemy.types import VARBINARY

class VarBinaryHex(VARBINARY):
    """Extend VARBINARY to handle hex strings."""

    impl = VARBINARY

    def bind_processor(self, dialect):
        """Return a processor that decodes hex values."""
        def process(value):
            return codecs.decode(value, 'hex')
        return process

    def result_processor(self, dialect, coltype):
        """Return a processor that encodes hex values."""
        def process(value):
            return codecs.encode(value, 'hex')
        return process

    def adapt(self, impltype):
        """Produce an adapted form of this type, given an impl class."""
        return VarBinaryHex()

The idea is to replace HEX and UNHEX, which require DBMS intervention, with python functions that do just the same, encode and decode an hex string just like HEX and UNHEX do. If you directly connect to the database, you can use HEX and UNHEX, but from SQLAlchemy, codecs.enconde and codecs.decode functions make the work for you.

I bet that, if anybody were interested, writting the appropriate processors, one could even manage the hex values as integers from the python perspective, allowing to store integers that are greater the BIGINT.

Some considerations:

  • BINARY could be used instead of VARBINARY if the length of the hex string is known.
  • Depending on what you are going to do, it might worth to un-/capitalise the string on the constructor of class that is going to use this type of column, so that you work with a consistent capitalization, right at the moment of the object initialization. i.e., 'aa' != 'AA' but 0xaa == 0xAA.
  • As said before, you could consider a processor that converts db binary hex values to prython integer.
  • When using VARBINARY, be careful because 'aa' != '00aa'
  • If you use BINARY, lets say that your column is col = Column(BinaryHex(length=4)), take into account that any value that you provide with less than length bytes will be completed with zeros. I mean, if you do obj.col = 'aabb' and commit it, when you later retrieve this, from the dataase, what you will get is obj.col == 'aabb0000', which is something quite different.

Upvotes: 1

muirbot
muirbot

Reputation: 2081

I wasn't able to get @iuridiniz's Custom column solution to work because of the following error:

sqlalchemy.exc.StatementError: (builtins.TypeError) encoding without a string argument

For an expression like:

m = Model(col='FFFF')
session.add(m)
session.commit()

I solved it by overriding process_bind_param, which processes the parameter before passing it to bind_expression for interpolation into your query language.

from sqlalchemy.types import VARCHAR
from sqlalchemy import func

class HashColumn(VARCHAR):

    def process_bind_param(self, value, dialect):
        # encode value as a binary
        if value:
            return bytes(value, 'utf-8')

    def bind_expression(self, bindvalue):
        # convert the bind's type from String to HEX encoded
        return func.HEX(bindvalue)

    def column_expression(self, col):
        # convert select value from HEX encoded to String
        return func.UNHEX(col)

And then defining the table is the same:

from sqlalchemy import Column, types
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Model(Base):
    __tablename__ = "model"
    id = Column(types.Integer, primary_key=True)
    col = Column(HashColumn(20))

    def __repr__(self):
        return "Model(col=%r)" % self.col

Upvotes: 1

iuridiniz
iuridiniz

Reputation: 2423

Only for select's and insert's

Well, for select you could use:

>>> from sqlalchemy import func
>>> session = (...)
>>> (...)
>>> engine = create_engine('sqlite:///:memory:', echo=True)
>>> q = session.query(Model.id).filter(Model.some == func.HEX('asd'))
>>> print q.statement.compile(bind=engine)
SELECT model.id
FROM model
WHERE model.some = HEX(?)

For insert:

>>> from sqlalchemy import func
>>> session = (...)
>>> (...)
>>> engine = create_engine('sqlite:///:memory:', echo=True)
>>> m = new Model(hash=func.HEX('asd'))
>>> session.add(m)
>>> session.commit()
INSERT INTO model (hash) VALUES (HEX(%s))

A better approach: Custom column that converts data by using sql functions

But, I think the best for you is a custom column on sqlalchemy using any process_bind_param, process_result_value, bind_expression and column_expression see this example.

Check this code below, it create a custom column that I think fit your needs:

from sqlalchemy.types import VARCHAR
from sqlalchemy import func

class HashColumn(VARCHAR):

    def bind_expression(self, bindvalue):
        # convert the bind's type from String to HEX encoded 
        return func.HEX(bindvalue)

    def column_expression(self, col):
        # convert select value from HEX encoded to String
        return func.UNHEX(col)

You could model your a table like:

from sqlalchemy import Column, types
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Model(Base):
    __tablename__ = "model"
    id = Column(types.Integer, primary_key=True)
    col = Column(HashColumn(20))

    def __repr__(self):
        return "Model(col=%r)" % self.col

Some usage:

>>> (...)
>>> session = create_session(...)
>>> (...)
>>> model = Model(col='Iuri Diniz')
>>> session.add(model)
>>> session.commit()

this issues this query:

INSERT INTO model (col) VALUES (HEX(?)); -- ('Iuri Diniz',)

More usage:

>>> session.query(Model).first()
Model(col='Iuri Diniz')

this issues this query:

SELECT 
    model.id AS model_id, UNHEX(model.col) AS model_col 
FROM model 
LIMIT ? ; -- (1,)

A bit more:

>>> session.query(Model).filter(Model.col == "Iuri Diniz").first()
Model(col='Iuri Diniz')

this issues this query:

SELECT 
    model.id AS model_id, UNHEX(model.col) AS model_col 
FROM model 
WHERE model.col = HEX(?) 
LIMIT ? ; -- ('Iuri Diniz', 1)

Extra: Custom column that converts data by using python types

Maybe you want to use some beautiful custom type and want to convert it between python and the database.

In the following example I convert UUID's between python and the database (the code is based on this link):

import uuid
from sqlalchemy.types import TypeDecorator, VARCHAR

class UUID4(TypeDecorator):
    """Portable UUID implementation

    >>> str(UUID4())
    'VARCHAR(36)'
    """

    impl = VARCHAR(36)

    def process_bind_param(self, value, dialect):
        if value is None:
            return value
        else:
            if not isinstance(value, uuid.UUID):
                return str(uuid.UUID(value))
            else:
                # hexstring
                return str(value)

    def process_result_value(self, value, dialect):
        if value is None:
            return value
        else:
            return uuid.UUID(value)

Upvotes: 11

Related Questions