how to set up SQL/Hive connection with cloudera cluster to read data stored on cluster

Question

I wanted to retrieve the data stored onto Hadoop Cloudera cluster either via Hive, Spark or SQL. I have SQL query written which should fetch data from the cluster. But prior to that, I want to understand how to set up connection /Cursor with cluster so that it will know where to read from or write to?

sc = spark.sparkContext or similarly HIVECONTEXT or SPARKCONTEXT will not suffice.

We might need to give URL for node and all. So how to do that?

Any Small example would suffice.

Vin · Accepted Answer

There are two ways to create the table in the hive:

1- Creating an external table schema:

CREATE EXTERNAL TABLE IF NOT EXISTS names_text(
          student_ID INT, FirstName STRING, LastName STRING,    
          year STRING, Major STRING)
          COMMENT 'Student Names'
          ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
          STORED AS TEXTFILE
          LOCATION '/user/andrena';

2- a) Create the schema for a managed table:

CREATE TABLE IF NOT EXISTS Names(
  student_ID INT, FirstName STRING, LastName STRING,    
  year STRING, Major STRING)
  COMMENT 'Student Names'
  STORED AS ORC;

b) Move the external table data to the managed table:

INSERT OVERWRITE TABLE Names SELECT * FROM names_text;

And finally, verify that the Hive warehouse stores the student names in the external and internal table respectively :

SELECT * FROM names_text;

SELECT * from Names;

how to set up SQL/Hive connection with cloudera cluster to read data stored on cluster

Answers (1)

Related Questions