Reputation: 422
Does anyone have a nice neat and stable way to achieve the equivalent of:
pandas.read_sql(sql, con, chunksize=None)
and/or
pandas.read_sql_table(table_name, con, schema=None, chunksize=None)
connected to redshift with SQLAlchemy & psycopg2, directly into a dask DataFrame ?
The solution should be able to handle large amounts of data
Upvotes: 4
Views: 1972
Reputation: 57251
You might consider the read_sql_table
function in dask.dataframe.
http://dask.pydata.org/en/latest/dataframe-api.html#dask.dataframe.read_sql_table
>>> df = dd.read_sql_table('accounts', 'sqlite:///path/to/bank.db',
... npartitions=10, index_col='id') # doctest: +SKIP
This relies on the pandas.read_sql_table
function internally, so should be able to operate with the same restrictions, except that now you're asked to provide a number of partitions and an index column.
Upvotes: 1