Reputation: 392
I have created a book_crossing_dataset database in hive and created 3 table in it.
1) bx_books 2) bx_books_ratings 3) bx_user
like below
create database book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;
create external table stage_bx_user(
User_ID int,
Location string,
Age int
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
tblproperties ("skip.header.line.count"="1");
load data local inpath "/home/cloudera/workspace/BX-CSV-Dump/BX-Users.csv" into table stage_bx_user;
create external table bx_user(
User_ID int,
Location string,
Age int
)
stored as parquet;
insert into table bx_user select * from stage_bx_user;
Now I want to query this table from spark but when i am using below code
from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import HiveContext
conf = SparkConf().setAppName("Book Crossing")
sc = SparkContext(conf=conf)
hc = HiveContext(sc)
books = hc.sql("show databases")
print(books.show())
only default database is showing there.
I am using below link as reference Query HIVE table in pyspark
Upvotes: 0
Views: 79
Reputation: 1874
You have a call to create the database, but you are never using it in the create table call. I'd suggest that your first 3 lines of the script to be changed to
create database if not exists book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;
If this doesn't help, then the issue lies in Spark configuration. I'd suggest to try via SparkSession
with Hive support enabled:
import pyspark
spark = pyspark.sql.SparkSession.builder. \
appName("Book Crossing").enableHiveSupport().getOrCreate()
spark.sql("show databases").show()
Upvotes: 1