MMCM_
MMCM_

Reputation: 639

Performance SQLAlchemy and or

I use the following sqlalchemy code to retrieve some data from a database

q = session.query(hd_tbl).\
    join(dt_tbl, hd_tbl.c['data_type'] == dt_tbl.c['ID']).\
    filter(or_(and_(hd_tbl.c['object_id'] == get_id(row['object']),
                    hd_tbl.c['data_type'] == get_id(row['type']),
                    hd_tbl.c['data_provider'] == get_id(row['provider']),
                    hd_tbl.c['data_account'] == get_id(row['account']))
           for index, row in data.iterrows())).\
    with_entities(hd_tbl.c['ID'], hd_tbl.c['object_id'],
                  hd_tbl.c['data_type'], hd_tbl.c['data_provider'],
                  hd_tbl.c['data_account'], dt_tbl.c['value_type'])

where hd_tbland dt_tbl are two tables in sql db, and datais pandas dataframe containing typically around 1k-9k entries. hd_tbl contains at the moment around 90k rows.

The execution time seems to exponentially grow with the length of data. The corresponding sql statement (by sqlalchemy) looks as follows:

SELECT data_header.`ID`, data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account, basedata_data_type.value_type 
FROM data_header INNER JOIN basedata_data_type ON data_header.data_type = basedata_data_type.`ID` 
WHERE data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR 
data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR
 ...
 data_header.object_id = %s AND data_header.data_type = %s AND data_header.data_provider = %s AND data_header.data_account = %s OR 

The tables and columns are fully indexed, and performance is not satisfying. Currently it is way faster to read all the data of hd_tbl and dt_tbl into memory and merge with pandas merge function. However, this is seems to be suboptimal. Anyone having an idea on how to improve the sqlalchemy call?

EDIT: I was able to improve performance signifcantly by using sqlalchemy tuple_ in the following way:

header_tuples = [tuple([int(y) for y in tuple(x)]) for x in
                 data_as_int.values]
q = session.query(hd_tbl). \
    join(dt_tbl, hd_tbl.c['data_type'] == dt_tbl.c['ID']). \
    filter(tuple_(hd_tbl.c['object_id'], hd_tbl.c['data_type'],
                  hd_tbl.c['data_provider'],
                  hd_tbl.c['data_account']).in_(header_tuples)). \
    with_entities(hd_tbl.c['ID'], hd_tbl.c['object_id'],
                  hd_tbl.c['data_type'], hd_tbl.c['data_provider'],
                  hd_tbl.c['data_account'], dt_tbl.c['value_type'])

with corresponding query...

SELECT data_header.`ID`, data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account, basedata_data_type.value_type 
FROM data_header INNER JOIN basedata_data_type ON data_header.data_type = basedata_data_type.`ID` 
WHERE (data_header.object_id, data_header.data_type, data_header.data_provider, data_header.data_account) IN ((%(param_1)s, %(param_2)s, %(param_3)s, %(param_4)s), (%(param_5)s, ...))

Upvotes: 0

Views: 320

Answers (1)

anton
anton

Reputation: 257

I'd recommend you create a composite index on fields object_id, data_type, data_provider, ... with the same order, which they are placed in table, and make sure they're following in the same order in your WHERE condition. It may speed-up a bit your requests by cost of the disk space.

Also you may use several consequent small SQL requests instead a large query with complex OR condition. Accumulate extracted data on the application side or, if amount is large enough, in a fast temporary storage (a temporary table, noSQL, etc.)

In addition you may check MySQL configuration and increase values, related to memory volume per a thread, request, etc. A good idea is to check is your composite index fits into available memory, or it is useless.

I guess DB tuning may help a lot to increase productivity. Otherwise you may analyze your application's architecture to get more significant results.

Upvotes: 1

Related Questions