Reputation: 20342
I am trying to find a way to push everything from a dataframe into a SQL Server table. I did some Googling and came up with this.
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.save()
My dataframe is named 'Cluster' and it is of type pyspark.sql.dataframe.DataFrame. If the table exists in the DB, when I run the code above, I get this error message:
org.apache.spark.sql.AnalysisException: Table or view 'dbo.Cluster' already exists. SaveMode: ErrorIfExists.;
If I delete the table and run the code above I get this error message:
java.lang.IllegalArgumentException: Can't get JDBC type for struct<type:tinyint,size:int,indices:array<int>,values:array<double>>
This should be do-able. I've done this before in a Spark environment; that code worked perfectly fine. Now, I am working in a Python environment (using Databricks), and I'm stuck on this one thing.
Just to come full-circle, I'm creating the dataframe this way:
df = spark.read \
.jdbc("jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name", "dbo.table_name",
properties={"user": "u_name", "password": "p_wd"})
Then...I do some analytics and the data...eventually I apply some ML algos to the data...and come up with two dataframes.
df1 = df1.select("*").toPandas()
df2 = df2("*").toPandas()
Now, I need to concatenate these two dataframes together.
Cluster = pd.concat([df1, df2], axis=1)
Finally, I want to push this dataframe into a SQL Server table. Everything works totally fine until I eventually try to put everything from the dataframe to the table. That's the only issue I have here.
Upvotes: 1
Views: 1214
Reputation: 1300
You may try with a different SaveMode.Either Append to add data :
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.mode("append") \
.save()
or Overwrite to replace the data :
Cluster.write \
.format("jdbc") \
.option("url", "jdbc:sqlserver://name_here.database.windows.net:1433;databaseName=db_name") \
.option("dbtable", "dbo.Cluster") \
.option("user", "u_name") \
.option("password", "p_wd") \
.mode("overwrite") \
.save()
Upvotes: 3