Reputation: 251
I would like to write parquet files to PostgreSQL. I am using Spark and to write the file I am using Spark Dataframe's write.jdbc function. Everything works fine for the parquet column types like long, decimal or text. The problem is with the complex types like Map. I would like to to store Map as json in my PostgreSQL. Since I know that PostgreSQL can convert text datatype into json automatically(using the cast operation) so I am dumping map to a json string.
But the spark program complains that we are trying to insert "character varying" datatype into a column of "json" type. Which make it clear that PostgreSQL is not converting "character varying" into JSON automatically.
I went ahead and logged into my database and manually tried to insert a JSON string into a JSON datatype colum of a table and it worked.
My question is why my spark program is complaining for the cast operation?
I am using Spark version 1.6.1, PostgreSQL 4.3 and JDBC 42.1.1
Here is the code snippet
url = "jdbc:postgresql://host_name:host_port/db_name"
data_frame.write.jdbc(url, table_name, properties={"user": user, "password": password})
Error stack trace:
Hint: You will need to rewrite or cast the expression.
Position: 66 Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:148)
at org.postgresql.core.ResultHandlerDelegate.handleError(ResultHandlerDelegate.java:50)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2190)
at org.postgresql.core.v3.QueryExecutorImpl.flushIfDeadlockRisk(QueryExecutorImpl.java:1325)
at org.postgresql.core.v3.QueryExecutorImpl.sendQuery(QueryExecutorImpl.java:1350)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:458)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:791)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1547)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:215)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:277)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:276)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: org.postgresql.util.PSQLException: ERROR: column "value" is of type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 66
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2476)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2189)
... 18 more
Upvotes: 2
Views: 8391
Reputation: 289
are you using aws services. if yes then crawl your file using aws glue which creates a table. create a glue job which takes this data(table) as input catalog and for output select aws rds jdbc connection and choose required database. Run job and your paraquet file data will be loaded in postgres table.
Upvotes: 0
Reputation: 251
Its very late, but here is the answer for any lost souls.
You would need to pass "stringtype" argument to JDBC. It specifies the type to use when binding PreparedStatement parameters set via setString(). By default, it is varchar, which enforces that argument is varchar and prevents any cast operation(in my case JSON string to JSON). If we specify, stringtype=="unspecified" then it leaves it to the database to decide about which type the argument is. In my case, it helps in the way that Postgres would convert string to JSON readily.
Documentation: https://jdbc.postgresql.org/documentation/head/connect.html
Upvotes: 6