Reputation: 259
I want to output to csv every groupedDataSet.
Example of the data:
A,123
B,200
A,400
B,400
So my desired output is:
file 1:
A,123
A,400
file 2:
B,200
B,400
So basically a simple code for exampleData
:
exampleData.groupBy(0).sortGroup(1, Order.ASCENDING)
Now I want to output each groupedDataSet to a different CSV. What is the best practice to achieve this?
I'm using Scala version 2.11.12, and Flink version 1.11.0
Upvotes: 1
Views: 344
Reputation: 9265
What you need is a bucketing sink, but that's currently only supported for streaming jobs, not batch. Flink 1.12 has unified batch & streaming, so in theory that might work for you. I implemented my own bucketing sink for batch jobs, but it seems to have some issues with recent versions of Hadoop, which I need to debug.
Upvotes: 1