Reputation: 307
I have a (possibly) quite large ResultSet
when querying a (Vertica) SQL Database, and because it won't fit in memory I'm trying to find a way to stream the results (in CSV form) directly to an S3 bucket.
Most S3-related documentation and questions revolve around uploading Files or directories, where you can just pass in an InputStream to the base S3 SDK, or the TransferManager, but those aren't applicable.
It also seems like in all cases the S3 SDK requires to know the content length beforehand, but that's obviously not possible with a query result.
There is https://github.com/alexmojaki/s3-stream-upload, but that uses the soon (2025) to be deprecated v1 of the sdk, so that doesn't seem like a viable long-term option either.
Will my only option be to manually create batches of 5MB and do a multi-part upload using a more low-level SDK? Or are there other options?
Upvotes: 0
Views: 154
Reputation: 6749
Use EXPORT TO DELIMITED()
, using vsql, and you don't need any java code.
-- creating a test table ...
CREATE TABLE indata(uid,cid,att1) AS
SELECT 16,78940,'yel,k'
UNION ALL SELECT 17,78940,'master#$;@'
UNION ALL SELECT 15,78940,'"hello , how are you"'
UNION ALL SELECT NULL,NULL,NULL
;
This is the export command:
EXPORT TO DELIMITED(
directory='s3://tmp/export'
, filename='indata'
, addHeader='true'
, delimiter=','
, enclosedBy='"'
, escapeAs ='"'
) OVER(PARTITION BY uid) AS
SELECT * FROM indata;
s3://tmp/export/
will contain these directories, files and data (out of an export to a POSIX file system, as I have no access to S3 currently):
[dbadmin@mgessner01 export]$ find /tmp/export -name "*.csv" -exec head -v {} \;
==> /tmp/export/uid=/d90efa00-v_sbx_node0001-140006419035904-0.csv <==
uid,cid,att1
,,""
==> /tmp/export/uid=17/d90efa00-v_sbx_node0001-140006419035904-0.csv <==
uid,cid,att1
17,78940,"master#$;@"
==> /tmp/export/uid=15/2dca6b78-v_sbx_node0003-140230643066624-0.csv <==
uid,cid,att1
15,78940,"""hello ", how are you"""
==> /tmp/export/uid=16/2dca6b78-v_sbx_node0003-140230643066624-0.csv <==
uid,cid,att1
16,78940,"yel",k"
Upvotes: 0