Reputation: 149
I have an input folder that contains many files. I would like to do a batch operation on them like copy/move them to a new path.
I would like to do this using Spark.
Please help/suggest how to proceed on this.
Upvotes: 3
Views: 13925
Reputation: 246
You can read it using val myfile = sc.textFile("file://file-path")
if it is local dir and save them using myfile.saveAsTextFile("new-location")
. It's also possible to save with compression Link to ScalaDoc
What spark will do is to read all files and at a same time save them to a new location and make a batch of those files and store them in new location (HDFS/local).
Make sure you have the same directory available in each worker nodes of your spark cluster
Upvotes: 5
Reputation: 139
In the upper case you have to have the local files' path on each worker node.
If you want to get rid of that you can use a distributed filesystem like hadoop filesystem (hdfs).
In this case you have to give path like this:
hdfs://nodename-or-ip:port/path-to-directory
Upvotes: 0