Reputation: 1248
Is it possible to perform distributed concurrent writes to parquet format?
And is it possible to read parquet files while they are being written?
If there are methods for concurrent read/writes I'd be interested to learn about.
Upvotes: 25
Views: 5131
Reputation: 1248
I eventually had an answer from Parquet developers: answer is no to both questions:
Parquet writers are not thread-safe and files cannot be read or written by different readers or writers concurrently. Parquet doesn't expose flush/sync operations to the user (for good reason) so there isn't a way to reliably do this anyway.
Upvotes: 26