Reputation: 3199
I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below :
.option("lowerBound", "31-MAR-02");
.option("upperBound", "01-MAY-19");
.option("partitionColumn", "data_date");
.option("numPartitions", 240);
But gives error :
java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
at java.sql.Timestamp.valueOf(Timestamp.java:204)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)
Then tried as below
.option("lowerBound", "2002-03-31"); //changed the date format
.option("upperBound", "2019-05-02");
.option("partitionColumn", "data_date");
.option("numPartitions", 240);
Still no luck. So what is the correct way to pass the date as "lower/upperBound"? Is there a way to specify/set option parameter data Type ?
Part-2 Checked the options properly. they were overwritten in between before executing the query. So corrected it. ... now that error resolved.
But for below options:
.option("lowerBound", "2002-03-31 00:00:00");
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");
Query String :
query -> ( SELECT * FROM MODEL_VALS ) T
It is throwing another error :
java.sql.SQLException: ORA-12801: error signaled in parallel query server P022, instance nj0005
ORA-01861: literal does not match format string
Upvotes: 10
Views: 13842
Reputation: 530
If you are using Oracle, see: apache/spark/.../OracleIntegrationSuite.scala
val df1 = spark.read.format("jdbc")
.option("url", jdbcUrl)
.option("dbtable", "datetimePartitionTest")
.option("partitionColumn", "d")
.option("lowerBound", "2018-07-06")
.option("upperBound", "2018-07-20")
.option("numPartitions", 3)
// oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d
// (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail bound evaluation of
// the partition column. E.g. 2018-07-06 cannot be evaluated as Timestamp, and the error
// message says: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff].
.option("oracle.jdbc.mapDateToTimestamp", "false")
.option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
.load()
Upvotes: 15
Reputation: 519
I stumbled on this question as I am solving a similar problem. But in this case Spark 2.4.2 is sending date in format 'yyyy-MM-dd HH:mm:ss.ssss' to Oracle and it returned "Not a valid month" as it expects 'dd-MMM-yy HH:mm:ss.ssss'. To solve that I followed: Spark GitHub Link , it says:
Override beforeFetch method in OracleDialect to finish the following two things:
Set Oracle's NLS_TIMESTAMP_FORMAT to "YYYY-MM-DD HH24:MI:SS.FF" to match java.sql.Timestamp format. Set Oracle's NLS_DATE_FORMAT to "YYYY-MM-DD" to match java.sql.Date format.
And it solved the issue. Hope it helps.
Upvotes: 3
Reputation: 6871
all of the following options must be set in this way in order for it to work:
spark.read
.option("header", true)
.option("inferSchema", true)
.option("timestampFormat", "MM/dd/yyyy h:mm:ss a")
.csv("PATH_TO_CSV")
Upvotes: 1
Reputation: 87249
The given parameters have type timestamp, but you're providing the only date. Timestamp has format as yyyy-mm-dd hh:mm:ss
, so you need to provide your dates as 2002-03-31 00:00:00
and 2019-05-01 23:59:59
correspondingly...
Upvotes: 2