ASH
ASH

Reputation: 20342

Trying to Push Records from a Dataframe to a SQL Server Table

I am trying to find a way to push everything from a dataframe into a SQL Server table. I did some Googling and came up with this.

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .save()

My dataframe is named 'Cluster' and it is of type pyspark.sql.dataframe.DataFrame. If the table exists in the DB, when I run the code above, I get this error message:

org.apache.spark.sql.AnalysisException: Table or view 'dbo.Cluster' already exists. SaveMode: ErrorIfExists.;

If I delete the table and run the code above I get this error message:

java.lang.IllegalArgumentException: Can't get JDBC type for struct<type:tinyint,size:int,indices:array<int>,values:array<double>>

This should be do-able. I've done this before in a Spark environment; that code worked perfectly fine. Now, I am working in a Python environment (using Databricks), and I'm stuck on this one thing.

Just to come full-circle, I'm creating the dataframe this way:

df = spark.read \

.jdbc("jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name", "dbo.table_name",
              properties={"user": "u_name", "password": "p_wd"})

Then...I do some analytics and the data...eventually I apply some ML algos to the data...and come up with two dataframes.

df1 = df1.select("*").toPandas()
df2 = df2("*").toPandas()

Now, I need to concatenate these two dataframes together.

Cluster = pd.concat([df1, df2], axis=1)

Finally, I want to push this dataframe into a SQL Server table. Everything works totally fine until I eventually try to put everything from the dataframe to the table. That's the only issue I have here.

Upvotes: 1

Views: 1214

Answers (1)

Axel R.
Axel R.

Reputation: 1300

You may try with a different SaveMode.Either Append to add data :

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("append") \
    .save()

or Overwrite to replace the data :

Cluster.write \
    .format("jdbc") \
    .option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
    .option("dbtable", "dbo.Cluster") \
    .option("user", "u_name") \
    .option("password", "p_wd") \
    .mode("overwrite") \
    .save()

Source

Upvotes: 3

Related Questions