aarafeh
aarafeh

Reputation: 1

Spark-ftp : DataFrame is Saved to FTP incorrectly

I am struggling with the spark-ftp, I am reading from oracle DB and then wants to write the output data (from dataframe) to FTP. Everything is fine, but why is it copying a file called 1part-XXX..csv.crc instead of .csv?

Here is the code :

val jdbcSqlConnStr = "jdbc:oracle:thin://@Server:1601/WW"

val jdbcDbTable = "(select CAST(ID as INT) Program_ID, Program_name from 
 PROGRAM WHERE ROWNUM <=100) P"

val jdbcDF = sqlContext.read.format("jdbc").options(    
Map("url" -> jdbcSqlConnStr,
"dbtable" -> jdbcDbTable,
"driver" -> "oracle.jdbc.driver.OracleDriver",    
"user" -> "user",
"password" ->  "pass"    
)).load

jdbcDF.write.
  format("com.springml.spark.sftp").
  option("host", "ftp.Server.com").
  option("username", "user").
  option("password", "*****").
  option("fileType", "csv").
  option("delimiter", "|").
  save("/Test/sample.csv")

But the output file uploaded to FTP is binary and I found this in console output:

8/02/08 17:08:43 INFO FileOutputCommitter: Saved output of task
'attempt_20180208170840_0000_m_000000_0' to
file:/C:/Users/aarafeh/AppData/Local/Temp/spark_sftp_connection_temp286/_tempor ary/0/task_20180208170840_0000_m_000000 18/02/08 17:08:43 INFO SparkHadoopMapRedUtil: attempt_20180208170840_0000_m_000000_0: Committed 18/02/08 17:08:43 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1565 bytes result sent to driver 18/02/08 17:08:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3591 ms on localhost (executor driver) (1/1) 18/02/08 17:08:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/02/08 17:08:43 INFO DAGScheduler: ResultStage 0 (csv at DefaultSource.scala:243) finished in 3.611 s 18/02/08 17:08:43 INFO DAGScheduler: Job 0 finished: csv at
DefaultSource.scala:243, took 3.814856 s 18/02/08 17:08:44 INFO FileFormatWriter: Job null committed. 18/02/08 17:08:44 INFO DefaultSource: Copying
C:\Users\aarafeh\AppData\Local\Temp\spark_sftp_connection_temp286.part-00000- 1efdd0f1-8201-49b4-af15-5878204e57ea-c000.csv.crc to
/J28446_Engage/Test/sample.csv
18/02/08 17:08:46 INFO SFTPClient: Copying files from C:\Users\aarafeh\AppData\Local\Temp\spark_sftp_connection_temp286.part-00000- 1efdd0f1-8201-49b4-af15-5878204e57ea-c000.csv.crc to
/J28446_Engage/Test/sample.csv
18/02/08 17:08:47 INFO SFTPClient: Copied files successfully...

The file was uploaded successfully (sample.csv), but it is binary since it uploads the crc file.

Any idea why and how to solve?

Upvotes: 0

Views: 2369

Answers (1)

aarafeh
aarafeh

Reputation: 1

I escalated this as an issue under the Spark-ftp project as shown here:

https://github.com/springml/spark-sftp/issues/18

and they fixed it.

Thanks.

Upvotes: 0

Related Questions