Trying to convert a "org.apache.spark.sql.DataFrame" object to pandas dataframe results in error "name 'dataframe' is not defined" in Databricks

Question

I am trying to query an SQL database via jdbc connection in databricks and store the query results as a pandas dataframe. All of the methods I can find for this online involve storing it as a type of Spark object first using Scala code and then converting this to pandas. I tried for cell 1:

%scala
val df_table1 = sqlContext.read.format("jdbc").options(Map(
    ("url" -> "jdbc:sqlserver://myserver.database.windows.net:1433;database=mydb"),
    ("dbtable" -> "(select top 10 * from myschema.table) as table"),
    ("user" -> "user"),
    ("password" -> "password123"),
    ("driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver"))
).load()

which results in:

df_table1: org.apache.spark.sql.DataFrame = [var1: int, var2: string ... 50 more fields]

Great! But when I try to convert it to a pandas df in cell 2 so I can use it:

import numpy as np
import pandas as pd 

result_pdf = df_table1.select("*").toPandas()

print(result_pdf)

It generates the error message:

NameError: name 'df_table1' is not defined

How do I successfully convert this object to a pandas dataframe, or alternatively is there any way of querying the SQL database via jdbc connection using python code without needing to use Scala at all (I do not particularly like Scala syntax and would rather avoid it if at all possible)?

Trying to convert a "org.apache.spark.sql.DataFrame" object to pandas dataframe results in error "name 'dataframe' is not defined" in Databricks

Answers (1)

Related Questions

Trying to convert a &quot;org.apache.spark.sql.DataFrame&quot; object to pandas dataframe results in error &quot;name &#39;dataframe&#39; is not defined&quot; in Databricks

Answers (1)

Related Questions

Trying to convert a "org.apache.spark.sql.DataFrame" object to pandas dataframe results in error "name 'dataframe' is not defined" in Databricks