Reputation: 225
I have a requirement, to write the data from csv/pandas dataframe to databricks table. My python code may not be running on databricks cluster. I may be running on an isolated standalone node. I am using databricks python connector to select the data from databricks table. selects are working. But I am unable to load the data from csv or pandas dataframe to databricks.
Can I use databricks python connector to load the bulk data in csv/pandas dataframe into databricks table?
Below is the code snippet for getting the databricks connection and performing selects on standalone node using databricks-python connector.
from databricks import sql
conn = sql.connect(server_hostname=self.server_name,
http_path=self.http_path,
access_token=self.access_token
)
try:
with conn.cursor() as cursor:
cursor.execute(qry)
return cursor.fetchall_arrow().to_pandas()
except Exception as e:
print("Exception Occurred:" + str(e))
Note: My csv file is on Azure ADLS Gen2 storage. I am reading this file to create a pandas dataframe. All I need is to either load the data from pandas to Databricks delta table or read csv file and load the data to delta table. Can this be achieved using databricks-python connector instead of using spark?
Upvotes: 1
Views: 1944
Reputation: 4544
Can this be achieved using
databricks-python
connector instead of using spark?
The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL warehouses.
So, there isn't any scope with Databricks SQL connector for python to convert the Pandas Dataframe to Delta lake.
Coming to the second part of your question that if there any other way to convert pandas Dataframe to Delta table without using spark.sql
.
Since Delta lake is tied with Spark, there isn't any possible way as far as I know which allows you to convert pandas Dataframe to delta table without using spark.
Alternatively, I suggest you to read the file as spark Dataframe and then convert it into Delta
format using below code.
val file_location = "/mnt/tables/data.csv"
val df = spark.read.format("csv")
.option("inferSchema", "true")
.option("header", "true")
.option("sep", ",")
.load(file_location)
df.write.mode("overwrite").format("delta").saveAsTable(table_name)
Upvotes: 1