spark dataframe write using spark-csv failing

Question

I am trying to write spark dataframe to s3 using pysparkn and spark-csv using following code

df1.filter( df1['y'] == 2)\
            .withColumnRenamed("x",'a')\
            .select("a","b","c")\
            .write\
            .format('com.databricks.spark.csv')\
            .options(header="true")\
            .options(codec="org.apache.hadoop.io.compress.BZip2Codec")\
            .save('s3://bucket/abc/output")

but, I am getting error that "output dir already exists", i am sure that output dir does not exist before job started, i tried running with different output dir name but write is still failing.

If i look at s3 bucket after job failed, i found that there are few part file are written by spark but when it try to write more it is failing, script running fine locally, I am using 10 spark executor on aws cluster. Does anyone have any idea what is wrong with this code ?

spark dataframe write using spark-csv failing

Answers (1)

Related Questions