Reputation: 1
I am struggling with the spark-ftp, I am reading from oracle DB and then wants to write the output data (from dataframe) to FTP. Everything is fine, but why is it copying a file called 1part-XXX..csv.crc
instead of .csv
?
Here is the code :
val jdbcSqlConnStr = "jdbc:oracle:thin://@Server:1601/WW"
val jdbcDbTable = "(select CAST(ID as INT) Program_ID, Program_name from
PROGRAM WHERE ROWNUM <=100) P"
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConnStr,
"dbtable" -> jdbcDbTable,
"driver" -> "oracle.jdbc.driver.OracleDriver",
"user" -> "user",
"password" -> "pass"
)).load
jdbcDF.write.
format("com.springml.spark.sftp").
option("host", "ftp.Server.com").
option("username", "user").
option("password", "*****").
option("fileType", "csv").
option("delimiter", "|").
save("/Test/sample.csv")
But the output file uploaded to FTP is binary and I found this in console output:
8/02/08 17:08:43 INFO FileOutputCommitter: Saved output of task
'attempt_20180208170840_0000_m_000000_0' to
file:/C:/Users/aarafeh/AppData/Local/Temp/spark_sftp_connection_temp286/_tempor ary/0/task_20180208170840_0000_m_000000 18/02/08 17:08:43 INFO SparkHadoopMapRedUtil: attempt_20180208170840_0000_m_000000_0: Committed 18/02/08 17:08:43 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1565 bytes result sent to driver 18/02/08 17:08:43 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3591 ms on localhost (executor driver) (1/1) 18/02/08 17:08:43 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/02/08 17:08:43 INFO DAGScheduler: ResultStage 0 (csv at DefaultSource.scala:243) finished in 3.611 s 18/02/08 17:08:43 INFO DAGScheduler: Job 0 finished: csv at
DefaultSource.scala:243, took 3.814856 s 18/02/08 17:08:44 INFO FileFormatWriter: Job null committed. 18/02/08 17:08:44 INFO DefaultSource: Copying
C:\Users\aarafeh\AppData\Local\Temp\spark_sftp_connection_temp286.part-00000- 1efdd0f1-8201-49b4-af15-5878204e57ea-c000.csv.crc to
/J28446_Engage/Test/sample.csv 18/02/08 17:08:46 INFO SFTPClient: Copying files from C:\Users\aarafeh\AppData\Local\Temp\spark_sftp_connection_temp286.part-00000- 1efdd0f1-8201-49b4-af15-5878204e57ea-c000.csv.crc to
/J28446_Engage/Test/sample.csv 18/02/08 17:08:47 INFO SFTPClient: Copied files successfully...
The file was uploaded successfully (sample.csv
), but it is binary since it uploads the crc file.
Any idea why and how to solve?
Upvotes: 0
Views: 2369
Reputation: 1
I escalated this as an issue under the Spark-ftp project as shown here:
https://github.com/springml/spark-sftp/issues/18
and they fixed it.
Thanks.
Upvotes: 0