koushik kirugulige
koushik kirugulige

Reputation: 1

read the table created by pyspark (hudi format) using spark-sql without hive metastore

I used pysprak with hudi and created a table in my local in path /tmp/table_name

I want to read (select *) the same table in spark-sql, but when I do show tables the table_name is not found I did use conf like spark.sql.catalog.local.warehouse,spark.sql.warehouse.dir still no luck

from pyspark.sql import SparkSession

 spark = SparkSession.builder \
.appName("YourAppName") \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.hudi.catalog.HoodieCatalog") \
.config("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") \
.config("spark.kryo.registrator", "org.apache.spark.HoodieSparkKryoRegistrar") \
.config("spark.jars.packages", "org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.1") \
.getOrCreate()

#after some sterps I do

inserts.write.format("hudi"). \ options(**hudi_options). \ mode("overwrite"). \ save(basePath)

#here base path value is /tmp`

when I read the same table from pyspark I am able to

then when I do spark-sql with the above mentioned conf (spark.sql.catalog.local.warehouse,spark.sql.warehouse.dir) and do show tables; WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Time taken: 1.434 seconds

Note: I have no hive metastore

Upvotes: 0

Views: 695

Answers (1)

BlueBike
BlueBike

Reputation: 19

Did you register the dataframe as a view in the SparkSession?

df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
df.createTempView("people")
df2 = spark.sql("SELECT * FROM people")

After registering your table as a view within the session you should be able to reference it using SparkSQL.

https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.createTempView.html#pyspark.sql.DataFrame.createTempView

Upvotes: -1

Related Questions