kfk
kfk

Reputation: 841

Is there a way to run SQL statements on parquet files using dask?

Spark has a functionality that allows users to run SQL statements on a Spark dataframe. What about Dask? If it is not available now, is it something being considered?

Upvotes: 0

Views: 1252

Answers (3)

nilpferd1991
nilpferd1991

Reputation: 166

There is also dask-sql (disclaimer: I am the author), which allows to run arbitrary SQL queries against dask dataframes (or data which can be loaded with dask, e.g. parquet).

For example, after installation with conda install dask-sql you are able to run

from dask_sql import Context

c = Context()

c.create_table("my_table", "/some/path/to/parquet")
c.sql("SELECT * FROM my_table").compute()

dask-sql is very similar to the already mentioned blazingSQL, but also runs without a GPU (cluster).

Upvotes: 3

Mike McCarty
Mike McCarty

Reputation: 21

BlazingSQL provides a distributed SQL engine in Python that works with Parquet files. It is built on RAPIDS, so it requires NVIDIA GPUs.

Upvotes: 2

Alex Fedotov
Alex Fedotov

Reputation: 582

Presto / AWS Athena may be an answer to your question.

Upvotes: 0

Related Questions