Reputation: 83
I'm trying to copy a SQL table from one database to another database in another server using Databricks. I have heard that one method of doing this is by using pyodbc because I need to read the data from a stored procedure; JDBC does not support reading from stored procedures. I want to use code similar to the one below:
import pyodbc
conn = pyodbc.connect( 'DRIVER={ODBC Driver 17 for SQL Server};'
'SERVER=mydatabe.database.azure.net;'
'DATABASE=AdventureWorks;UID=jonnyFast;'
'PWD=MyPassword')
# Example getting records back from stored procedure (could also be a SELECT statement)
cursor = conn.cursor()
execsp = "EXEC GetConfig 'Dev'"
conn.autocommit = True
cursor.execute(execsp)
# Get all records
rc = cursor.fetchall()
The question is, once I get the data into the rc variable using pyodbc, should I bother moving the data into a Databricks Dataframe, or should I just push the data out to my destination?
Upvotes: 1
Views: 793
Reputation: 87069
You may not need to convert data into the Dataframe, and simply write the data into the destination. But it's really depends on the amount of data that you're trying to push - if it's a lot, then creating the Dataframe may help because it will parallelize writing (but that may overload the server). If it's not so much data, just write to destination.
Also, in this case, your worker nodes aren't really used, because all processing will happen on the Driver node - you may consider to use so-called Single Node Cluster, but you will need to size the driver node accordingly to your resultset.
P.S. You can also look to the alternatives listed in this answer.
Upvotes: 1