Reputation: 23
I have an instance of Apache Zeppelin running on a remote server and I'm using Scala to communicate with it via a Spark interpreter.
I would like to transfer a csv file that is stored on that server's directory to HDFS (Hadoop), which is also on a remote server.
I don't have access to any of the configuration files on the server, I am not able to install anything, and I am only able to make commands within Zeppelin.
I have tried to use the standard
sc.textFile("file:///test.csv")
statement, but it returns the following error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 19.0 failed 4 times, most recent failure: Lost task 0.3 in stage 19.0 (TID 64, 10.244.79.7): java.io.FileNotFoundException: File file:/test.csv does not exist
I have been told that the reason why I get this error is because Spark cannot see my Zeppelin file system. I am not sure how to enable something like that.
Any advice would be super helpful.
Upvotes: 2
Views: 2515
Reputation: 198
If you are trying to read a local file in zeppelin make sure to put that file in the zeppelin folder(zeppelin installation folder) as zeppelin is not able to access files outside the zeppelin folder.
Upvotes: 1
Reputation: 689
You can try:
sc.textFile("hdfs://DNS:PORT/test.csv")
where DNS
is address of name node of your Hadoop cluster and PORT
is port on which HDFS is listening, where default value depends on Hadoop distribution. Common value is 8020
. You can check it in core-site.xml
in parameter fs.default.name
or fs.defaultFS
depending on your Hadoop version.
Example request can look like:
sc.textFile("hdfs://address:8020/test.csv")
Upvotes: 2