Reputation: 431
I have a bunch of columns ( around 30). Out of which there are arrays, text fields with multiple line space (Word document) etc. I think CSV will not be an apt format because of multiple new lines. I am thinking of using Parquet format.
The file itself needs to be generated via NodeJS.
Any pointers would be helpful
Upvotes: 4
Views: 4006
Reputation: 8684
Node JS libraries on parquet are not well maintained. You can checkout my other answer on the same topic. It lists most popular one's.
I would suggest giving below library a try.
DuckDB - DuckDB is an in-process embedded library/DB. It has a lot of features built around parquet files.
It can write parquet file on disk, write it directly to S3 bucket, etc..
Parquet feature supported by DuckDB - https://duckdb.org/docs/data/parquet . Here is a simple snippet.
var duckdb = require('duckdb');
var db = new duckdb.Database(':memory:');
db.all("COPY (SELECT 'BOB' AS NAME, 'LONDON' AS CITY) TO 'result-snappy.parquet' (FORMAT 'parquet')", function(err, res) {
if (err) {
throw err;
}
console.log(res)
});
Executing the script:
PS C:\Users\user1\Downloads> node .\duck-script.js
[ { Count: 1 } ]
In your case you might have to load the data first to duckdb table and then write it to a parquet file.
DOCS
Upvotes: 6