Reputation: 79
How to load a parquet file into vertica database using spark???
link (http://www.sparkexpert.com/2015/04/17/save-apache-spark-dataframe-to-database/)
I tried to load data frame(parquet files) using the above link into mysql it worked. But when i tried to load it into vertica database this is the error i am facing.The error below is because vertica db doesn’t support the datatypes(String) which is in the data frames(parquet file). I do not wanted to type cast the columns since its going to be a performance issue. we are looking to load around 280 million rows. Could you please suggest the best way to load the data into vertica db.
Exception in thread “main” java.sql.SQLSyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
at com.vertica.util.ServerErrorData.buildException(Unknown Source)
at com.vertica.io.ProtocolStream.readExpectedMessage(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepareImpl(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.dataengine.VDataEngine.prepare(Unknown Source)
at com.vertica.jdbc.common.SPreparedStatement.(Unknown Source)
at com.vertica.jdbc.jdbc4.S4PreparedStatement.(Unknown Source)
at com.vertica.jdbc.VerticaJdbc4PreparedStatementImpl.(Unknown Source)
at com.vertica.jdbc.VJDBCObjectFactory.createPreparedStatement(Unknown Source)
at com.vertica.jdbc.common.SConnection.prepareStatement(Unknown Source)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:275)
at org.apache.spark.sql.DataFrame.createJDBCTable(DataFrame.scala:1611)
at com.sparkread.SparkVertica.JdbctoVertica.main(JdbctoVertica.java:51)
Caused by: com.vertica.support.exceptions.SyntaxErrorException: [Vertica][VJDBC](5108) ERROR: Type “TEXT” does not exist
… 13 more
Upvotes: 2
Views: 1634
Reputation: 1
When the Vertica table exists with the same column names as the dataFrame (and the corresponding types, VARCHAR) the following has worked for me (while keeping vertica's jdbc):
myDataFrame.write().mode(SaveMode.Append).jdbc(url, "MY_VERTICA_TABLE", new Properties());
Upvotes: 0
Reputation: 7616
Since you are getting the error on the createJDBCTable
, you could just create the table yourself and use insertIntoJDBC
instead.
Another idea would be to try and set spark.sql.dialect
to Postgres since I noticed registerDialect(PostgresDialect)
in spark. That said, I don't know how to do this other than to use jdbc:postgresql
, but if you use that driver you would not get any advantage of a optimal insert that Vertica's JDBC driver would give you. You might need to modify here to allow it to use that dialect for jdbc:vertica
. If for some reason that doesn't work you'd need to add in a new dialect.
Personally I think the first option is simpler.
Upvotes: 1