dfvt
dfvt

Reputation: 87

Flink Read CSV across multiple host

I have a cluster like https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/cluster_setup.html and in which worker I have multiple CSV corresponding to a shard for each host. I want to use the table API to calculate a sum of a CSV column across multiple hosts. Each worker should be able to calculate the sum of the CSV that he has and return the result on the master. Is it possible and if it is what should I implement.

Upvotes: 0

Views: 160

Answers (1)

Fabian Hueske
Fabian Hueske

Reputation: 18987

If I understand your question correctly, you'd like to read CSV files and sum up some fields. That's a rather simple query and not a problem for Flink.

With the latest Flink version (1.4.2), you can register a CsvTableSource as a table and run a query like SELECT sum(a), sum(b) FROM yourTable.

Note that the CSV files should be stored in file system that is accessible from all machines (distributed file system, NFS, ...).

Upvotes: 1

Related Questions