Lucky Ning
Lucky Ning

Reputation: 111

When use addFile,I got java.io.FileNotFoundException.

I got a confused problem.I want to upload a hdfs file to all spark workers.The code is as follow:

import sys 
import os
from pyspark.ml.feature import Word2Vec
from pyspark import SparkConf, SparkContext
from pyspark.sql import Row 
import jieba.posseg as posseg 
import jieba
if __name__ == "__main__":
    reload(sys)  
    sys.setdefaultencoding('utf-8')
    conf = SparkConf().setAppName('fenci_0')
    sc = SparkContext(conf=conf)
    date = '20180801' 
    scatelist = ['95']
    #I want to add a hdfs_file to all spark worker
    hdfs_file_path = '/home/a/part-00000'
    sc.addFile(hdfs_file_path)
    ...
    ...

But it got a error like "java.io.FileNotFoundException: Added file file does not exist".

But I can access the hdfs_file_path,and can get the file content.Why this occured? I guess when add a hdfs file,the sc.addFile maybe required some prefix such as 'sc.add('hdfs//:hdfs_file_path')'?

I have search this on google and stackoverflow,but maybe the keyword I searched is not correct.Would you help me find the error?Thank you a lot.

Upvotes: 2

Views: 1154

Answers (1)

pri
pri

Reputation: 1531

Yes.

You need to give the full HDFS path, maybe something like below:

sc.addFile('hdfs://<reference_to_name_node_or_name_service_ID>/home/a/part-00000')

This is because sc.addFile() method can accept files from any filesystem(either a local file, or HDFS, or any other Hadoop supported filesystem, or even URIs).

Upvotes: 3

Related Questions