Eric Ed Lohmar
Eric Ed Lohmar

Reputation: 1922

sqlalchemy and Teradata Studio yielding different results from the same statements

My company has some tables that need to be updated from queries regularly. Normally, I would make the query into a view, but the destination schema does not have ubiquitous select access and it often takes a while for IT for help us out in that regard and we need the tables setup quickly.

So, I've created a python script that reads a .sql file, parses it into individual statements which generally involve creating volatile tables, then executes those statements through sqlalchemy. Lastly the python program generates a MERGE statement to update the destination table from the final volatile table in the .sql file. The python looks like this:

import sys
import re
import time
from os.path import dirname

import sqlalchemy
import sqlalchemy_teradata
from my_stuff import CSpinner, end_program, Nudger, get_auth, create_update_text

t0 = time.clock()

# Assigning Variables
print('Assigning Variables')
user, pw, connstring, schema = get_auth()
td_engine = sqlalchemy.create_engine(f'teradata://{user}:{pw}@{connstring}/{schema}')

if __name__ == '__main__':
    initial_sql_file = rf'{dirname(__file__)}\{sys.argv[1]}'
    table_name = sys.argv[2]
    keys = eval(sys.argv[3])

    # Reading the SQL file to get the result set
    with open(initial_sql_file, 'r') as f:
        sql_raw = f.read()

    statements = re.findall(r'.+?;', sql_raw, re.S)
    cols = td_engine.execute(f'SEL TOP 1* from {table_name}').keys()
    print(f'Assigned\n\n{table_name}')

    # Querying TD using the SQL read above
    with CSpinner('\nPerforming SQL Magic...', '...Performed'):
        with td_engine.connect() as conn:
            n = 0
            for sttmnt in statements:
                conn.execute(sttmnt)
                n += 1
                print(f'Statement {n} Complete')
            merge_sql = create_update_text('_data', table_name, cols, keys)
            print(merge_sql)
            conn.execute(merge_sql)
    time.sleep(.1)

    # ends the program
    end_program(t0)

Given that I previously had issues with the MERGE statements, I have create_update_text return sqlalchemy.text(statement).execution_options(autocommit=True) as you can see below:

def create_update_text(temp, perm, df, keys, how='both'):
    cols = [col for col in df]

    for key in keys:
        assert key in cols, f'ID {key} not in Columns'

    joined_keys = ' AND '.join(f'p."{key}" = t."{key}"' for key in keys)
    sql_text = f'MERGE INTO {perm} p\n' \
               f'USING {temp} t\n' \
               f'ON {joined_keys}\n'

    if how in {'both', 'matched'}:
        sql_text += 'WHEN MATCHED THEN\n' \
                    ' UPDATE\n' \
                    ' SET\n     '

        sql_text += ',\n     '.join(f'"{col}" = t."{col}"' for col in cols if col not in keys)

    if how in {'both', 'not'}:
        sql_text += '\nWHEN NOT MATCHED THEN\n' \
                    ' INSERT (\n     '

        sql_text += ',\n     '.join(f'"{col}"' for col in cols)

        sql_text += '\n )\n' \
                    ' VALUES (\n     '

        sql_text += ',\n     '.join(f't."{col}"' for col in cols)

        sql_text += '\n )'

    return sqlalchemy.text(sql_text).execution_options(autocommit=True)

A short example .sql file would look like:

CREATE VOLATILE TABLE _data as (
    SEL
     PART_ID,
     REMOVAL_DATE,
     SUM(FLAGS) AS FLAG_SUM,

     CASE
      WHEN MECHANIC = 'JOHN' THEN 1
      ELSE 0
     END AS JOHNORNOT

    FROM SCHEMA.TABLE

    GROUP BY 1,2,4
) WITH DATA
  PRIMARY KEY (PART_ID,REMOVAL_DATE,JOHNORNOT)
  ON COMMIT PRESERVE ROWS
;

I've run into a problem where running the update with python submitting the statements is yielding different results compared to pasting the query and the python-generated MERGE statement into Teradata Studio and running them as individual statements. For example, the SUM fields may come up with different values or the binary CASE statement won't execute the same.

The question is, if I'm running exactly the same query in TD Studio as I am with SQLAlchemy, why is it yielding different results? Is something about my program that can be affecting the data?

In order to compare succinctly, I've put the TD and PY data in separate tables and run a query that merges the two which looks like this:

Select
 td.PART_ID
 td.MONTH
 td.YEAR
 td.REMOVALS as TD_Removals
 py.REMOVALS as PY_Removals
 td.FAILURES as TD_Failures
 py.FAILURES as PY_Failues

from TD_Data td

join PY_Data py
  on td.PART_ID = py.PART_ID
 and td.MONTH = py.MONTH
 and td.YEAR = py.YEAR
 and (td.REMOVALS <> py.REMOVALS
      or td.FAILURES <> py.FAILURES)

From what I can tell the PY_ numbers that do not match are always lower than the corresponding TD_ value. Some sample data is shown below:

PART_ID        MONTH YEAR TD_Removals PY_Removals TD_Fails PY_Fails
26-3132-9-0005 7     2015 2           1           0        0
26-2350-9-0001 3     2015 15          12          11       11
43-3614-9-0002 1     2017 2           0           0        0
97-2373-9-0001 3     2016 8           2           1        1
26-7410-9-0001 7     2016 6           1           0        0
26-3155-9-0003 9     2015 1           0           0        0
97-3510-9-0001 7     2017 28          26          0        0
97-2792-9-0006 6     2017 3           2           0        0
26-7933-9-0001 10    2015 3           0           0        0
97-2313-9-0002 3     2016 15          14          13       13
29-2800-9-0009 6     2017 3           2           0        0
26-3242-9-0006 7     2016 7           0           0        0

Per @dnoeth's comment below, I ran SELECT Transaction_Mode FROM dbc.sessioninfoV WHERE SessionNo = SESSION; on both python and TD_Studio. TD_Studio yields A and python yields T. The question has now become which is more correct and how to ensure similarity in the future.

Upvotes: 0

Views: 491

Answers (1)

dnoeth
dnoeth

Reputation: 60462

Based on the comments the sessions from Python/Studio use a different transaction mode:

 SELECT Transaction_Mode -- T=Teradata, A=ANSI mode
 FROM dbc.sessioninfoV
 WHERE SessionNo = SESSION;

Session mode is set in JDBC through the TMODE property, the possible values are TERA, ANSI & DEFAULT.

One of the differences is case sensitivity of literals, in a Teradata session it's case insensitive while an ANSI session defaults to case sensitive, this explains a different number of matches for WHEN MECHANIC = 'JOHN' THEN 1 when there's JOHN and John.

Comparison of Transactions in ANSI and Teradata Session Modes

When a column is defined as NOT CASESPECIFIC in the CREATE TABLE you must either switch to Teradata mode or add (NOT CASESPECIFIC) after each string literal, e.g. WHEN MECHANIC = 'JOHN' (NOT CASESPECIFIC) THEN 1.

For all differences between Teradata and ANSI session see the Transaction Processing chapter in the manuals.

Upvotes: 1

Related Questions