ravindu93
ravindu93

Reputation: 1

Querying a SQL Table with over 10million rows

I have a table with over 10 million rows in Dremio. I have connected to it from Python PYODBC. I want to run a simple query like shown below:

SELECT REPORTDATE, TRANSDATE 
FROM TABLE 
WHERE TRANSDATE = '2020-01-05'

The issue is that it takes forever to run this query via Python. What would be the solution for this?

Upvotes: 0

Views: 268

Answers (2)

Christopher Jones
Christopher Jones

Reputation: 10506

Use Oracle's python-oracledb driver instead of pyodbc and increase the arraysize value, see https://python-oracledb.readthedocs.io/en/latest/user_guide/tuning.html#tuning-fetch-performance

cur = connection.cursor()

cur.arraysize = 5000

for row in cur.execute("SELECT * FROM very_big_table"):
    print(row)

Unless you actually want a Pandas data frame, using the driver native calls will be fastest.

The 'Thin' mode (which is the default) of python-oracledb is likely to be faster than the 'Thick' mode.

Upvotes: 0

Abe
Abe

Reputation: 11

I would recommend using sqlalchemy or pandas to make the call.

Personally, I use pandas (below example uses cx_Oracle since we use Oracle servers):

import pandas as pd

qry = '''
SELECT REPORTDATE, TRANSDATE 
FROM TABLE 
WHERE TRANSDATE = '2020-01-05'
'''

dsn_tns = cx_Oracle.makedsn('host-name','port',service_name='database')
conn = cx_Oracle.connect(user=user_name, password=pwd, dsn=dsn_tns)
c = conn.cursor()
results = pd.read_sql(qry , conn)
c.close()
conn.close()

If it still runs too slow, you can pull it in chunks using the chunksize attribute:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html

Upvotes: 0

Related Questions