Reputation: 1
I used pysprak with hudi and created a table in my local in path /tmp/table_name
I want to read (select *) the same table in spark-sql, but when I do show tables the table_name is not found I did use conf like spark.sql.catalog.local.warehouse,spark.sql.warehouse.dir still no luck
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("YourAppName") \
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") \
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.hudi.catalog.HoodieCatalog") \
.config("spark.sql.extensions", "org.apache.spark.sql.hudi.HoodieSparkSessionExtension") \
.config("spark.kryo.registrator", "org.apache.spark.HoodieSparkKryoRegistrar") \
.config("spark.jars.packages", "org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.1") \
.getOrCreate()
#after some sterps I do
inserts.write.format("hudi"). \ options(**hudi_options). \ mode("overwrite"). \ save(basePath)
#here base path value is /tmp`
when I read the same table from pyspark I am able to
then when I do spark-sql with the above mentioned conf (spark.sql.catalog.local.warehouse,spark.sql.warehouse.dir) and do
show tables; WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Time taken: 1.434 seconds
Note: I have no hive metastore
Upvotes: 0
Views: 695
Reputation: 19
Did you register the dataframe as a view in the SparkSession?
df = spark.createDataFrame([(2, "Alice"), (5, "Bob")], schema=["age", "name"])
df.createTempView("people")
df2 = spark.sql("SELECT * FROM people")
After registering your table as a view within the session you should be able to reference it using SparkSQL.
Upvotes: -1