Akash agrawal
Akash agrawal

Reputation: 11

Error while inserting data using jdbc connection in temporary table

I am writing a aws glue etl job in which I am creating a jdbc connection to push the data from dataframe to the mssql table.

   jdbc_url = "jdbc:sqlserver://{host}:{port};database={databaseName};user={username};password={password}".format(
                    host=args[host],
                    port=args[port], 
                    databaseName=args[name],
                    username= args[username],
                    password=args[password]
                )
    file_data.write.jdbc(url = jdbc_url, table = "##temptable", mode="overwrite")

This code is working fine with a small size file but giving error for bigger files:

ava.sql.BatchUpdateException: Invalid object name '##temptable'.
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2101)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:868)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:867)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2269)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:138)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
24/07/08 18:53:26 WARN TaskSetManager: Lost task 11.0 in stage 3.0 (TID 235) (8e735cbd09f8 executor driver): java.sql.BatchUpdateException: Invalid object name '##temptable'.
        at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeBatch(SQLServerPreparedStatement.java:2101)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:713)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1(JdbcUtils.scala:868)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$saveTable$1$adapted(JdbcUtils.scala:867)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2(RDD.scala:1011)
        at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$2$adapted(RDD.scala:1011)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2269)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:138)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1516)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

fix for the error provided

Upvotes: 1

Views: 56

Answers (0)

Related Questions