Pyspark dataframe: write jdbc to dynamic creation of table with given schema

Question

Is there a way to dynamically create tables with given schema from pyspark dataframe like we do with pandas dataframe's to_sql method.

similarly can we create a table with a given schema from the pyspark dataframe ?

df.write.jdbc("jdbc:postgresql://localhost:5432/postgres", "sample_data1",mode="overwrite", properties=prop);

In the above code,how can we give the schema to generate the table we wanted ?

Som · Accepted Answer

I think you are searching for these options-

createTableOptions

This is a JDBC writer related option. If specified, this option allows setting of database-specific table and partition options when creating a table (e.g., CREATE TABLE t (name string) ENGINE=InnoDB.). This option applies only to writing.

createTableColumnTypes

The database column data types to use instead of the defaults, when creating the table. Data type information should be specified in the same format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments VARCHAR(1024)"). The specified types should be valid spark sql data types. This option applies only to writing.

example

# Specifying create table column data types on write
jdbcDF.write \
    .option("createTableColumnTypes", "name CHAR(64), comments VARCHAR(1024)") \
    .jdbc("jdbc:postgresql:dbserver", "schema.tablename",
          properties={"user": "username", "password": "password"})

ref - spark-doc

Pyspark dataframe: write jdbc to dynamic creation of table with given schema

Answers (1)

example

Related Questions