Shankar Guru
Shankar Guru

Reputation: 1161

TypeError seen when using ddf.read_sql_table - dask

dask==2.27.0
python==3.8.3
Operating System== centos7
  1. If its possible to pass sqlalchemy text which contains query for ddf.read_sql_table.
  2. Seeing TypeError when parameters to read_sql_table are passed as mentioned in documentation.

Code:

from sqlalchemy.sql import text
from sqlalchemy.engine import create_engine
import dask.dataframe as ddf


DIALECT = '<value>'
SQL_DRIVER= '<value>'
USERNAME= '<value>'
PASSWORD = '<value>'
HOSTNAME = '<value>'
PORT = '<value>'
SID = '<value>'
ENGINE_PATH = DIALECT + '+' + SQL_DRIVER + '://' + USERNAME + ':' + PASSWORD +'@' + HOSTNAME + ':' + str(PORT) + '/' + SID
s = text("My complicated sql query")
df = ddf.read_sql_table(s, ENGINE_PATH, index_col='id', npartitions=10)

Error seen:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/synology/data/ganesh/arun/code_jan_27/art_rematching/venv_3.8_50/lib/python3.8/site-packages/dask/dataframe/io/sql.py", line 115, in read_sql_table
    index = table.columns[index_col] if isinstance(index_col, str) else index_col
TypeError: 'method' object is not subscriptable

Upvotes: 0

Views: 133

Answers (1)

actuary_meets_data
actuary_meets_data

Reputation: 63

So dask does not currently support complex queries directly in the text mechanism (as of v2021.02.0). My workaround is as follows:

  1. Save your query as a view in the database
  2. Construct the sqlalchemy uri string (as you have done in the question)
  3. Initialize a "Table" object that points to your view
  4. Pass everything to the read_sql_table function

See below for a simple example:

from sqlalchemy import Table, Metadata, Column, Integer
import dask.dataframe as ddf
import multiprocessing

uri = f'{dialect}://{user}:{password}@{host}:{port}/{dbName}'
view = '[NAME_OF_VIEW]'
schema = '[NAME_OF_SCHEMA]'
pkey = '[PRIMARY_KEY_COLUMN]'
myview = Table(view, Metadata(schema=schema), Column(pkey, Integer, primary_key=True))

df = ddf.read_sql_table(table=myview, uri=uri, index_col=pkey, schema=schema, npartitions=multiprocessing.cpu_count()*3)

Upvotes: 1

Related Questions