Reputation: 429
I wanted to retrieve the data stored onto Hadoop Cloudera cluster either via Hive, Spark or SQL. I have SQL query written which should fetch data from the cluster. But prior to that, I want to understand how to set up connection /Cursor with cluster so that it will know where to read from or write to?
sc = spark.sparkContext
or similarly HIVECONTEXT or SPARKCONTEXT will not suffice.
We might need to give URL for node and all. So how to do that?
Any Small example would suffice.
Upvotes: 0
Views: 123
Reputation: 525
There are two ways to create the table in the hive:
1- Creating an external table schema:
CREATE EXTERNAL TABLE IF NOT EXISTS names_text(
student_ID INT, FirstName STRING, LastName STRING,
year STRING, Major STRING)
COMMENT 'Student Names'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/andrena';
2- a) Create the schema for a managed table:
CREATE TABLE IF NOT EXISTS Names(
student_ID INT, FirstName STRING, LastName STRING,
year STRING, Major STRING)
COMMENT 'Student Names'
STORED AS ORC;
b) Move the external table data to the managed table:
INSERT OVERWRITE TABLE Names SELECT * FROM names_text;
And finally, verify that the Hive warehouse stores the student names in the external and internal table respectively :
SELECT * FROM names_text;
SELECT * from Names;
Upvotes: 1