Reputation: 23
I am trying to read multiple s3 directories (each directory will have multiple files in it) using spark's read csv method however I get error as s3 path has some illegal character's. I have already checked related questions but don't see solution for java. Not able to implement the same solution for java.
DataSet<Row> DocsTemp = null;
String scanResultFolder = "\"" + "s3a://somebucket/Dir1/" + "\",\"" + "s3a://somebucket/Dir2/" + "\"";
DocsTemp = spark.read().csv(scanResultFolder);
but when running, it considers entire string (scanResultFolder) as a single path and gives error.
Please suggest me the correct way to achieve this functionality.
Upvotes: 2
Views: 368
Reputation: 5521
You need to pass in a String[]
and not just a comma-separated String
(see https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-java.lang.String...-):
DocsTemp = spark.read().csv({"s3a://somebucket/Dir1/", "s3a://somebucket/Dir2/"});
Upvotes: 2