Reputation: 841
Spark has a functionality that allows users to run SQL statements on a Spark dataframe. What about Dask? If it is not available now, is it something being considered?
Upvotes: 0
Views: 1252
Reputation: 166
There is also dask-sql (disclaimer: I am the author), which allows to run arbitrary SQL queries against dask dataframes (or data which can be loaded with dask, e.g. parquet).
For example, after installation with conda install dask-sql
you are able to run
from dask_sql import Context
c = Context()
c.create_table("my_table", "/some/path/to/parquet")
c.sql("SELECT * FROM my_table").compute()
dask-sql
is very similar to the already mentioned blazingSQL, but also runs without a GPU (cluster).
Upvotes: 3
Reputation: 21
BlazingSQL provides a distributed SQL engine in Python that works with Parquet files. It is built on RAPIDS, so it requires NVIDIA GPUs.
Upvotes: 2