Xwnola
Xwnola

Reputation: 61

read_sql_table in Dask returns NoSuchTableError

I have a read_sql using pandas and it works fine. However, when I tried to re-create the same dataframe under Dask using the same logic. It gives me NoSuchTableError. I know for sure the table exists in my SQL database.

pandas #works:

import urllib
import sqlalchemy as sa
import pandas as pd

sql = "SELECT * FROM my_table"
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};\
                             SERVER=my_server;\
                             DATABASE=db_name;\
                             Trusted_Connection=yes;")
engine = sa.create_engine('mssql+pyodbc:///?odbc_connect=%s' % params)
df = pd.read_sql(sql, engine)
print(df.head())

Since Dask is using the full URL from sqlalchemy, I also tried to re-create the same connection in sqlalchemy and it works. It just puzzles me why it does not work in Dask.

sqlalchemy #works:

import pyodbc
import sqlalchemy as sal
from sqlalchemy import create_engine

engine = sal.create_engine('mssql+pyodbc://my_server/db_name\
         ?driver=SQL+Server+Native+Client+11.0?trusted_connection=yes')

result = engine.execute("select * from my_table")

for row in result:
    print(row[0])

Dask #NoSuchTableError:

import urllib
import sqlalchemy as sa
import dask.dataframe as dd
from sqlalchemy.engine.url import make_url

params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};\
                             SERVER=my_server;\
                             DATABASE=db_name;\
                             Trusted_Connection=yes;")
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
url = make_url(conn_str)
df = dd.read_sql_table('my_table', url, index_col='ID')
print(df.head())

Has anyone came across the same/similar issue? Any thought is much appreciated! Thanks in advance.

Upvotes: 6

Views: 1263

Answers (1)

Xu Ren
Xu Ren

Reputation: 39

without knowing further details about how your SQL Server is set up, I believe this will be SQL Server specific from the Dask documentation, you need to provide the schema= keyword, like this:

dftest = dd.read_sql_table(table="table_name_only", uri=uri, index_col="somekey", schema="schema_name", divisions=[1,2,3])

Note that uri here is the SQLAlchemy string, not a connection object.

https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_sql_table

Upvotes: 3

Related Questions