Spark writing compressed CSV with custom path to S3

Question

I'm trying to simply write a CSV to S3 using Spark written in Scala:

I notice in my output bucket the following file: ...PROCESSED/montfh-04.csv/part-00000-723a3d72-56f6-4e62-b627-9a181a820f6a-c000.csv.snappy

when it should only be montfh-04.csv

Code:

    val processedMetadataDf = spark.read.csv("s3://" + metadataPath + "/PROCESSED/" + "month-04" + ".csv")
    val processCount = processedMetadataDf.count()
    if (processCount == 0) {
        // Initial frame is 0B -> Overwrite with path 
        val newDat = Seq("dummy-row-data")
        val unknown_df = newDat.toDF()
        unknown_df.write.mode("overwrite").option("header","false").csv("s3://" + metadataPath + "/PROCESSED/" + "montfh-04" + ".csv")
        
    }

Here I notice two strange things:

It puts it in a directory
It adds that weird part sub-path to the file with snappy compression

All I am trying to do is simply write a flat CSV file with that name to the specified path. What are my options?

Spark writing compressed CSV with custom path to S3

Code:

Answers (1)

Related Questions