Reputation: 175
I'm looking to write streaming data into separate folders based on partitions by key, like partitionBy
in Spark.
Data:
key1 1
key2 1
key1 2
key3 1
...
Expected output:
/output_directory/key1
+-- part-0-0
+-- part-0-1
+-- ...
/output_directory/key2
+-- part-0-0
+-- part-0-1
+-- ...
/output_directory/key3
+-- part-0-0
+-- part-0-1
+-- ...
...
(I read about stream splitting and custom sinks in Flink. But it seems that the former is not suited to cases where the keys are not known in advance. As for custom sinks - I would like to see if there are other ways out before looking into them.)
Are there built-in methods or third-party libraries to achieve this in Flink?
Upvotes: 0
Views: 145
Reputation: 43499
Take a look at Flink's FileSink, which you will want to use with a BucketAssigner that assigns stream elements to buckets based on their key.
For more details, see https://ci.apache.org/projects/flink/flink-docs-stable/docs/connectors/datastream/file_sink/.
Upvotes: 1