ND User
ND User

Reputation: 149

Move/Copy files in Spark hadoop

I have an input folder that contains many files. I would like to do a batch operation on them like copy/move them to a new path.

I would like to do this using Spark.

Please help/suggest how to proceed on this.

Upvotes: 3

Views: 13925

Answers (2)

hnahak
hnahak

Reputation: 246

You can read it using val myfile = sc.textFile("file://file-path") if it is local dir and save them using myfile.saveAsTextFile("new-location"). It's also possible to save with compression Link to ScalaDoc

What spark will do is to read all files and at a same time save them to a new location and make a batch of those files and store them in new location (HDFS/local).

Make sure you have the same directory available in each worker nodes of your spark cluster

Upvotes: 5

kbt
kbt

Reputation: 139

In the upper case you have to have the local files' path on each worker node.

If you want to get rid of that you can use a distributed filesystem like hadoop filesystem (hdfs).

In this case you have to give path like this:

hdfs://nodename-or-ip:port/path-to-directory

Upvotes: 0

Related Questions