Reputation: 1
I'm trying to write pandas data frame to the local system or to hdfs with spark in cluster mode but it's throwing an error like
IOError: [Errno 2] No such file or directory: {hdfs_path/file_name.txt}
This is how I'm writing
df.to_csv("hdfs_path/file_name.txt", sep="|")
I am using python and the job is running through a shell script.
This works fine if I'm in local mode but doesn't in yarn-cluster mode.
Any support is welcome and thanks in advance.
Upvotes: 0
Views: 1575
Reputation: 11
I have the same issue, i always convert the dataframe into a spark dataframe before creating a file on an Apache Spark filesystem :
df_sp = spark.createDataFrame(df_pd)
df_sp.coalesce(1).write.csv("my_file.csv", mode='overwrite', header = True)
Upvotes: 1