Reputation: 1
I have a table with over 10 million rows in Dremio. I have connected to it from Python PYODBC. I want to run a simple query like shown below:
SELECT REPORTDATE, TRANSDATE
FROM TABLE
WHERE TRANSDATE = '2020-01-05'
The issue is that it takes forever to run this query via Python. What would be the solution for this?
Upvotes: 0
Views: 268
Reputation: 10506
Use Oracle's python-oracledb driver instead of pyodbc and increase the arraysize
value, see https://python-oracledb.readthedocs.io/en/latest/user_guide/tuning.html#tuning-fetch-performance
cur = connection.cursor()
cur.arraysize = 5000
for row in cur.execute("SELECT * FROM very_big_table"):
print(row)
Unless you actually want a Pandas data frame, using the driver native calls will be fastest.
The 'Thin' mode (which is the default) of python-oracledb is likely to be faster than the 'Thick' mode.
Upvotes: 0
Reputation: 11
I would recommend using sqlalchemy or pandas to make the call.
Personally, I use pandas (below example uses cx_Oracle since we use Oracle servers):
import pandas as pd
qry = '''
SELECT REPORTDATE, TRANSDATE
FROM TABLE
WHERE TRANSDATE = '2020-01-05'
'''
dsn_tns = cx_Oracle.makedsn('host-name','port',service_name='database')
conn = cx_Oracle.connect(user=user_name, password=pwd, dsn=dsn_tns)
c = conn.cursor()
results = pd.read_sql(qry , conn)
c.close()
conn.close()
If it still runs too slow, you can pull it in chunks using the chunksize
attribute:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html
Upvotes: 0