Srujan
Srujan

Reputation: 1

writing a pandas dataframe(.csv) to local system or hdfs with spark in cluster mode

I'm trying to write pandas data frame to the local system or to hdfs with spark in cluster mode but it's throwing an error like

IOError: [Errno 2] No such file or directory: {hdfs_path/file_name.txt}

This is how I'm writing

df.to_csv("hdfs_path/file_name.txt", sep="|")

I am using python and the job is running through a shell script.

This works fine if I'm in local mode but doesn't in yarn-cluster mode.

Any support is welcome and thanks in advance.

Upvotes: 0

Views: 1575

Answers (1)

marinou marinou
marinou marinou

Reputation: 11

I have the same issue, i always convert the dataframe into a spark dataframe before creating a file on an Apache Spark filesystem :

df_sp = spark.createDataFrame(df_pd)
df_sp.coalesce(1).write.csv("my_file.csv", mode='overwrite', header = True)

Upvotes: 1

Related Questions