TheCrystalShip
TheCrystalShip

Reputation: 259

Apache Flink output to csv file for each GroupedDataSet

I want to output to csv every groupedDataSet.

Example of the data:

A,123
B,200
A,400
B,400

So my desired output is:

file 1:

A,123
A,400

file 2:

B,200
B,400

So basically a simple code for exampleData:

exampleData.groupBy(0).sortGroup(1, Order.ASCENDING)

Now I want to output each groupedDataSet to a different CSV. What is the best practice to achieve this?

I'm using Scala version 2.11.12, and Flink version 1.11.0

Upvotes: 1

Views: 344

Answers (1)

kkrugler
kkrugler

Reputation: 9265

What you need is a bucketing sink, but that's currently only supported for streaming jobs, not batch. Flink 1.12 has unified batch & streaming, so in theory that might work for you. I implemented my own bucketing sink for batch jobs, but it seems to have some issues with recent versions of Hadoop, which I need to debug.

Upvotes: 1

Related Questions