Reputation: 1922
My company has some tables that need to be updated from queries regularly. Normally, I would make the query into a view, but the destination schema does not have ubiquitous select access and it often takes a while for IT for help us out in that regard and we need the tables setup quickly.
So, I've created a python script that reads a .sql file, parses it into individual statements which generally involve creating volatile tables, then executes those statements through sqlalchemy. Lastly the python program generates a MERGE
statement to update the destination table from the final volatile table in the .sql file. The python looks like this:
import sys
import re
import time
from os.path import dirname
import sqlalchemy
import sqlalchemy_teradata
from my_stuff import CSpinner, end_program, Nudger, get_auth, create_update_text
t0 = time.clock()
# Assigning Variables
print('Assigning Variables')
user, pw, connstring, schema = get_auth()
td_engine = sqlalchemy.create_engine(f'teradata://{user}:{pw}@{connstring}/{schema}')
if __name__ == '__main__':
initial_sql_file = rf'{dirname(__file__)}\{sys.argv[1]}'
table_name = sys.argv[2]
keys = eval(sys.argv[3])
# Reading the SQL file to get the result set
with open(initial_sql_file, 'r') as f:
sql_raw = f.read()
statements = re.findall(r'.+?;', sql_raw, re.S)
cols = td_engine.execute(f'SEL TOP 1* from {table_name}').keys()
print(f'Assigned\n\n{table_name}')
# Querying TD using the SQL read above
with CSpinner('\nPerforming SQL Magic...', '...Performed'):
with td_engine.connect() as conn:
n = 0
for sttmnt in statements:
conn.execute(sttmnt)
n += 1
print(f'Statement {n} Complete')
merge_sql = create_update_text('_data', table_name, cols, keys)
print(merge_sql)
conn.execute(merge_sql)
time.sleep(.1)
# ends the program
end_program(t0)
Given that I previously had issues with the MERGE
statements, I have create_update_text
return sqlalchemy.text(statement).execution_options(autocommit=True)
as you can see below:
def create_update_text(temp, perm, df, keys, how='both'):
cols = [col for col in df]
for key in keys:
assert key in cols, f'ID {key} not in Columns'
joined_keys = ' AND '.join(f'p."{key}" = t."{key}"' for key in keys)
sql_text = f'MERGE INTO {perm} p\n' \
f'USING {temp} t\n' \
f'ON {joined_keys}\n'
if how in {'both', 'matched'}:
sql_text += 'WHEN MATCHED THEN\n' \
' UPDATE\n' \
' SET\n '
sql_text += ',\n '.join(f'"{col}" = t."{col}"' for col in cols if col not in keys)
if how in {'both', 'not'}:
sql_text += '\nWHEN NOT MATCHED THEN\n' \
' INSERT (\n '
sql_text += ',\n '.join(f'"{col}"' for col in cols)
sql_text += '\n )\n' \
' VALUES (\n '
sql_text += ',\n '.join(f't."{col}"' for col in cols)
sql_text += '\n )'
return sqlalchemy.text(sql_text).execution_options(autocommit=True)
A short example .sql file would look like:
CREATE VOLATILE TABLE _data as (
SEL
PART_ID,
REMOVAL_DATE,
SUM(FLAGS) AS FLAG_SUM,
CASE
WHEN MECHANIC = 'JOHN' THEN 1
ELSE 0
END AS JOHNORNOT
FROM SCHEMA.TABLE
GROUP BY 1,2,4
) WITH DATA
PRIMARY KEY (PART_ID,REMOVAL_DATE,JOHNORNOT)
ON COMMIT PRESERVE ROWS
;
I've run into a problem where running the update with python submitting the statements is yielding different results compared to pasting the query and the python-generated MERGE
statement into Teradata Studio and running them as individual statements. For example, the SUM
fields may come up with different values or the binary CASE
statement won't execute the same.
The question is, if I'm running exactly the same query in TD Studio as I am with SQLAlchemy, why is it yielding different results? Is something about my program that can be affecting the data?
In order to compare succinctly, I've put the TD and PY data in separate tables and run a query that merges the two which looks like this:
Select
td.PART_ID
td.MONTH
td.YEAR
td.REMOVALS as TD_Removals
py.REMOVALS as PY_Removals
td.FAILURES as TD_Failures
py.FAILURES as PY_Failues
from TD_Data td
join PY_Data py
on td.PART_ID = py.PART_ID
and td.MONTH = py.MONTH
and td.YEAR = py.YEAR
and (td.REMOVALS <> py.REMOVALS
or td.FAILURES <> py.FAILURES)
From what I can tell the PY_
numbers that do not match are always lower than the corresponding TD_
value. Some sample data is shown below:
PART_ID MONTH YEAR TD_Removals PY_Removals TD_Fails PY_Fails
26-3132-9-0005 7 2015 2 1 0 0
26-2350-9-0001 3 2015 15 12 11 11
43-3614-9-0002 1 2017 2 0 0 0
97-2373-9-0001 3 2016 8 2 1 1
26-7410-9-0001 7 2016 6 1 0 0
26-3155-9-0003 9 2015 1 0 0 0
97-3510-9-0001 7 2017 28 26 0 0
97-2792-9-0006 6 2017 3 2 0 0
26-7933-9-0001 10 2015 3 0 0 0
97-2313-9-0002 3 2016 15 14 13 13
29-2800-9-0009 6 2017 3 2 0 0
26-3242-9-0006 7 2016 7 0 0 0
Per @dnoeth's comment below, I ran SELECT Transaction_Mode FROM dbc.sessioninfoV WHERE SessionNo = SESSION;
on both python and TD_Studio. TD_Studio yields A
and python yields T
. The question has now become which is more correct and how to ensure similarity in the future.
Upvotes: 0
Views: 491
Reputation: 60462
Based on the comments the sessions from Python/Studio use a different transaction mode:
SELECT Transaction_Mode -- T=Teradata, A=ANSI mode
FROM dbc.sessioninfoV
WHERE SessionNo = SESSION;
Session mode is set in JDBC through the TMODE
property, the possible values are TERA
, ANSI
& DEFAULT
.
One of the differences is case sensitivity of literals, in a Teradata session it's case insensitive while an ANSI session defaults to case sensitive, this explains a different number of matches for WHEN MECHANIC = 'JOHN' THEN 1
when there's JOHN
and John
.
Comparison of Transactions in ANSI and Teradata Session Modes
When a column is defined as NOT CASESPECIFIC
in the CREATE TABLE you must either switch to Teradata mode or add (NOT CASESPECIFIC
) after each string literal, e.g. WHEN MECHANIC = 'JOHN' (NOT CASESPECIFIC) THEN 1
.
For all differences between Teradata and ANSI session see the Transaction Processing chapter in the manuals.
Upvotes: 1