Sunil
Sunil

Reputation: 431

NodeJS Parquet write

I have a bunch of columns ( around 30). Out of which there are arrays, text fields with multiple line space (Word document) etc. I think CSV will not be an apt format because of multiple new lines. I am thinking of using Parquet format.

The file itself needs to be generated via NodeJS.

  1. Is there a preferred library for Parquet?
  2. Also, is there any recommendation for a specific file format for big query?

Any pointers would be helpful

Upvotes: 4

Views: 4006

Answers (1)

ns15
ns15

Reputation: 8684

Node JS libraries on parquet are not well maintained. You can checkout my other answer on the same topic. It lists most popular one's.

  • parquetjs
  • parquets
  • parquetjs-lite
  • node-parquet

I would suggest giving below library a try.

DuckDB - DuckDB is an in-process embedded library/DB. It has a lot of features built around parquet files.

It can write parquet file on disk, write it directly to S3 bucket, etc..

Parquet feature supported by DuckDB - https://duckdb.org/docs/data/parquet . Here is a simple snippet.

var duckdb = require('duckdb');
var db = new duckdb.Database(':memory:');
db.all("COPY (SELECT 'BOB' AS NAME, 'LONDON' AS CITY) TO 'result-snappy.parquet' (FORMAT 'parquet')", function(err, res) {
  if (err) {
    throw err;
  }
  console.log(res)
});

Executing the script:

PS C:\Users\user1\Downloads> node .\duck-script.js
[ { Count: 1 } ]

enter image description here

In your case you might have to load the data first to duckdb table and then write it to a parquet file.

DOCS

Upvotes: 6

Related Questions