Reputation: 77
I have Paraquet files in my S3 bucket which is not AWS S3.
Is there a tool that connects to any S3 service (like Wasabi, Digital Ocean, MinIO), and allows me to query the Parquet files?
Upvotes: 4
Views: 10509
Reputation: 2615
QStudio is a free SQL client that bundles DuckDB as part of the download: https://www.timestored.com/qstudio/help/duckdb-sql-editor
Go File Menu -> New DuckDB database then use the commands ns15 provided:
-- Highlight and press Control+E to Execute all commands one by one
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-east-1';
SET s3_access_key_id='****';
SET s3_secret_access_key='****';
-- Read the data using S3 protocol
select * from read_parquet('s3://test-me-ses/userdata1.parquet');
select * from read_parquet('https://www.timestored.com/data/sample/titanic.parquet');
If you install QStudio in windows it adds a parquet association that allows just double clicking to open and query a parquet file: https://www.timestored.com/qstudio/parquet-file-viewer
Disclaimer: I'm the author since 2013.
Upvotes: 0
Reputation: 8844
In case you need a GUI tool then you can use DBeaver + DuckDB. For programmatic use, You can find DuckDB library for most languages.
Here is my other answer on the same topic.
There is a slight difference since you are querying data on a S3 compatible storage. You simply need to run few additional commands mentioned here in the DuckDB docs.
-- Execute these commnds one by one or run as a script from DBeaver
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-east-1';
SET s3_access_key_id='****';
SET s3_secret_access_key='****';
-- Read the data using S3 protocol
select * from read_parquet('s3://test-me-ses/userdata1.parquet');
In case you have parquet files hosted and served from S3 or any web server via HTTP - DuckDB has this covered as well.
-- Read the data using HTTP protocol if parquet file is hosted
SELECT * FROM read_parquet('https://test-me-ses.s3.amazonaws.com/userdata1.parquet');
Any S3 compatible object store(Wasabi, Digital Ocean, MinIO) should work similarly..
You can also write the data back as parquet after transformation to any S3 compatible storage(AWS, MinIO etc..).
All of these can be done programmatically as well.
Upvotes: 5
Reputation: 1348
With MongoDB this can be done with our Atlas Data Federation product https://www.mongodb.com/docs/atlas/data-federation/overview/
It can query parquet files stored in S3.
Upvotes: 0