Reputation: 847
I am reading some HIVE tables using a spark session:
from pyspark.sql import SparkSession
spark = (
SparkSession
.builder
.appName("Test")
.enableHiveSupport()
.getOrCreat()
)
def read(table_path):
return spark.read.table(table_path)
read("aaaa.bbbb")
read("aaaa.cccc")
read("dddd.eeee")
Most of the time, I have no issues. But sometimes I got this error:
Mismatched input '-' expecting <EOF>
Do you know if there is an option to avoid this error? Also, can you help me to find the documentation? I searched but found nothing.
Thank you:)
Upvotes: 0
Views: 164
Reputation: 12960
when you want to read a table you need provide table name. not table path. you provide table path when you are trying to read files directly. So not sure exactly what is you requirement.
However there are two way you can read the table
from os.path import abspath
from pyspark.sql import SparkSession
from pyspark.sql import Row
# warehouse_location points to the default location for managed databases and tables
warehouse_location = abspath('spark-warehouse')
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()
Method 1 : use query
df = spark.sql("select * from database.table_name")
Method 2: Use table API
df = spark.table("database.table_na,e")
Documentation: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html
Upvotes: 0