Ayush Goyal
Ayush Goyal

Reputation: 392

Accessing already present table in Hive

I have created a book_crossing_dataset database in hive and created 3 table in it.

1) bx_books 2) bx_books_ratings 3) bx_user

like below

create database book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;

create external table stage_bx_user(
  User_ID int,
  Location string,
  Age int
)
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\;",
"quoteChar" = "\"")
stored as textfile
tblproperties ("skip.header.line.count"="1");

load data local inpath "/home/cloudera/workspace/BX-CSV-Dump/BX-Users.csv" into table stage_bx_user;

create external table bx_user(
 User_ID int,
 Location string,
 Age int
)
stored as parquet;

 insert into table bx_user select * from stage_bx_user;

Now I want to query this table from spark but when i am using below code

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import HiveContext


conf = SparkConf().setAppName("Book Crossing")

sc = SparkContext(conf=conf)

hc = HiveContext(sc)

books = hc.sql("show databases")

print(books.show())

only default database is showing there.

I am using below link as reference Query HIVE table in pyspark

Upvotes: 0

Views: 79

Answers (1)

Richard Nemeth
Richard Nemeth

Reputation: 1874

You have a call to create the database, but you are never using it in the create table call. I'd suggest that your first 3 lines of the script to be changed to

create database if not exists book_crossing_dataset;
use book_crossing_dataset;
add jar /home/cloudera/Downloads/ccsv-serde-0.9.1.jar;

If this doesn't help, then the issue lies in Spark configuration. I'd suggest to try via SparkSession with Hive support enabled:

import pyspark

spark = pyspark.sql.SparkSession.builder. \
        appName("Book Crossing").enableHiveSupport().getOrCreate()

spark.sql("show databases").show()

Upvotes: 1

Related Questions