Reputation: 83
I have a SQL statement that I want to run against an oracle database using a JDBC driver in databricks. I can get this to successfully run if the SQL statement is quite short, for example if it's just selecting all of the data from a table with no filters etc.. (e.g. select * from tbl)
However, I have an extremely long SQL code that I need to execute so I am creating the string to pass through to the JDBC driver by loading it from a .sql file saved on the databricks file storage.
When running this I was presented with an error and on investigation / printing the results of the text file I find it drops some of the SQL statement and provides a message before resuming the sql statement:
*** WARNING: skipped 62431 bytes of output ***
Effectively it looks like this in the printed string:
sum (
case
when dpr.pricing_Type in ('P', 'C') then
nvl (
decode (dpr.price / 100, null, 0,
decode (apr.price, 'N',
*** WARNING: skipped 62431 bytes of output ***
then
dpr.percentage_applied
else
0
end
) as price_percent,
Note that the code prior to the warning is for a completely different field to that of the code after the warning message.
Are there any suggestions on the cause of this and how to resolve it?
The full script I am running is below for reference, also note that the .sql file I am using is only 113kb and I am using python 3.7.5 via runtime 7.4 of databricks:
%python
# setup jdbc credentials (from key vault) and url
jdbcUsername = dbutils.secrets.get(scope="USER", key="ID")
jdbcPassword = dbutils.secrets.get(scope="PWD", key="PWD")
jdbcUrl = "jdbc:oracle:thin:@<REDACTED>"
jdbcDrv = "oracle.jdbc.driver.OracleDriver"
# Table Name
OutputTbl = "db.tblCore"
# Drop table.
spark.sql("DROP TABLE IF EXISTS " + OutputTbl )
# parallelism
lbound = 20160101
ubound = 20210115
fsize = "1000"
colname = "date_value_yyyymmdd"
numParts = "10"
# Get sql stetment from file.
with open('/dbfs/FileStore/shared_uploads/<REDACTED>/SQL', 'r') as f:
sql = file.read()
# Create DF and write output to a table.
spdf = (spark.read.format("jdbc")
.option("driver", jdbcDrv)
.option("url", jdbcUrl)
.option("user", jdbcUsername)
.option("password", jdbcPassword)
.option("dbtable", sql)
.option("numPartitions", numParts)
.option("fetchsize", fsize)
.option("partitionColumn", colname)
.option("lowerBound", lbound)
.option("upperBound", ubound)
.load())
spdf.write.mode("overwrite").saveAsTable(OutputTbl)
Upvotes: 1
Views: 1821
Reputation: 87184
This is not an error, it's just a warning that says that the output was truncated to prevent overloading of the browser, etc. You may look into driver's & executors log via Spark UI of your cluster - there should be more information...
I would also suggest to first try to execute that statement directly against Oracle, just to check if it works at all
Upvotes: 1