shankar
shankar

Reputation: 225

can we load the data from pandas dataframe to databricks table without spark.sql

I have a requirement, to write the data from csv/pandas dataframe to databricks table. My python code may not be running on databricks cluster. I may be running on an isolated standalone node. I am using databricks python connector to select the data from databricks table. selects are working. But I am unable to load the data from csv or pandas dataframe to databricks.

Can I use databricks python connector to load the bulk data in csv/pandas dataframe into databricks table?

Below is the code snippet for getting the databricks connection and performing selects on standalone node using databricks-python connector.

from databricks import sql
conn = sql.connect(server_hostname=self.server_name,
                           http_path=self.http_path,
                           access_token=self.access_token
                           )
try:
    with conn.cursor() as cursor:
        cursor.execute(qry)
        return cursor.fetchall_arrow().to_pandas()
except Exception as e:
    print("Exception Occurred:" + str(e))

Note: My csv file is on Azure ADLS Gen2 storage. I am reading this file to create a pandas dataframe. All I need is to either load the data from pandas to Databricks delta table or read csv file and load the data to delta table. Can this be achieved using databricks-python connector instead of using spark?

Upvotes: 1

Views: 1944

Answers (1)

Utkarsh Pal
Utkarsh Pal

Reputation: 4544

Can this be achieved using databricks-python connector instead of using spark?

The Databricks SQL Connector for Python is a Python library that allows you to use Python code to run SQL commands on Databricks clusters and Databricks SQL warehouses.

So, there isn't any scope with Databricks SQL connector for python to convert the Pandas Dataframe to Delta lake.

Coming to the second part of your question that if there any other way to convert pandas Dataframe to Delta table without using spark.sql.

Since Delta lake is tied with Spark, there isn't any possible way as far as I know which allows you to convert pandas Dataframe to delta table without using spark.

Alternatively, I suggest you to read the file as spark Dataframe and then convert it into Delta format using below code.

val file_location = "/mnt/tables/data.csv"

val df = spark.read.format("csv")
  .option("inferSchema", "true")
  .option("header", "true")
  .option("sep", ",")
  .load(file_location)

df.write.mode("overwrite").format("delta").saveAsTable(table_name)

Upvotes: 1

Related Questions