Sean
Sean

Reputation: 689

Packages for reading parquets in NodeJS (2024)

Creating a lambda in NodeJS that can parse parquet (version 2.0) files into JSON arrays. I have tried the following libraries which lead to failed results for various reasons:

It really seems like any and all support for nodejs parquet parsing has been discontinued or requires super big hurdles to utilize. There must be some parquet parsing libraries in nodejs that are still supported and work well with typescript. Do any of y'all have suggestions or works of wisdom for this?

Upvotes: 2

Views: 266

Answers (2)

platypii
platypii

Reputation: 1

Check out hyparquet. It's actively maintained, supports all modern parquet files, is written in pure js with no dependencies. Confirmed that it works in the lambda runtime, node, and the browser.

Upvotes: 0

Carlo Piovesan
Carlo Piovesan

Reputation: 332

duckdb-wasm npm module is big since it comprise also test and different deployments. Minimal stripped down version should be around 40MB uncompressed / 7.3 MB after compression.

duckdb module is also possibly an option.

Both duckdb AND duckdb-wasm use the same underlying library, only API is somewhat different AND there are different models (native on one side, Wasm-sandbox in the other). Both are in active development.

Upvotes: -1

Related Questions