Reputation: 477
The following snippet of code processes the filters in parallel and write out individual files to the output directory.Is there a way to get one large output file?
Array(
(filter1, outputPathBase + fileName),
(filter2, outputPathBase + fileName),
(filter3, outputPathBase + fileName)
).par.foreach {
case (extract, path) => extract.coalesce(1).write.mode("append").csv(path)
}
Thank you.
Upvotes: 0
Views: 171
Reputation: 3863
You can reduce the array into a single RDD by union them, that would parallelize the execution of each filter* by Spark
val rdd = Array(
filter1
filter2,
filter3).reduce(_.union(_))
rdd.write.mode("append").csv(path)
There is no need in this case to convert the Array
to a ParArray
I am assuming that filter1
, filter2
and filter3
are of the same type RDD[T]
Upvotes: 1