Salvador Vigo
Salvador Vigo

Reputation: 457

Consume multiple text files with Apache Flink DataSet API

I am writing a batch job with Apache Flink using the DataSet API. I can read a text file using readTextFile() but this function just read one file at once.

I would like to be able to consume all the text files in my directory one by one and process them at the same time one by one, in the same function as a batch job with the DataSet API, if it is possible.

Other option is implement a loop doing multiple jobs, one for each file, instead of one job, with multiples files. But I think this solution is not the best.

Any suggestion?

Upvotes: 1

Views: 854

Answers (1)

TobiSH
TobiSH

Reputation: 2921

If I got the documentation right you can read an entire path using ExecutionEnvironment.readTextFile(). You can find an example here: Word-Count-Batch-Example

References:

Upvotes: 1

Related Questions