Apache Flink: Reading parquet files from S3 in Data Stream APIs

Question

We have several external jobs producing small (500MiB) parquet objects on S3 partitioned by time. The goal is to create an application that would read those files, join them on a specific key and dump the result into a Kinesis stream or another S3 bucket.

Can it be achieved by just the means of Flink? Can it monitor and load new S3 objects being created and load them into the application?

Apache Flink: Reading parquet files from S3 in Data Stream APIs

Answers (1)

Related Questions