Veer
Veer

Reputation: 23

Spark read csv - multiple S3 paths in Java

I am trying to read multiple s3 directories (each directory will have multiple files in it) using spark's read csv method however I get error as s3 path has some illegal character's. I have already checked related questions but don't see solution for java. Not able to implement the same solution for java.

DataSet<Row> DocsTemp  = null;
String scanResultFolder = "\"" + "s3a://somebucket/Dir1/" + "\",\"" + "s3a://somebucket/Dir2/" + "\"";
DocsTemp = spark.read().csv(scanResultFolder);

but when running, it considers entire string (scanResultFolder) as a single path and gives error.

Please suggest me the correct way to achieve this functionality.

Upvotes: 2

Views: 368

Answers (1)

Ben Watson
Ben Watson

Reputation: 5521

You need to pass in a String[] and not just a comma-separated String (see https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-java.lang.String...-):

DocsTemp = spark.read().csv({"s3a://somebucket/Dir1/", "s3a://somebucket/Dir2/"});

Upvotes: 2

Related Questions