Ravi
Ravi

Reputation: 65

How To access Hive Table to The spark

I am new to Spark I am trying to access Hive table to Spark

1) Created Spark Context

val hc=new HiveContext(sc)

val hivetable= hc.sql("Select * from test_db.Table")

My Question is I got the table into Spark.

1) Why we need to register the Table ?

2) We can Perform Directly SQL operations still why do we need Dataframe functions Like Join, Select, Filter...etc ?

What makes difference in both operations between SQL Query` and Dataframe Operations

3) What is Spark Optimization ? How does it works?

Upvotes: 2

Views: 3624

Answers (1)

Sandeep Singh
Sandeep Singh

Reputation: 7990

  1. You don't need to register temporary table if you are accessing Hive table using Spark HiveContext. Registering a DataFrame as a temporary table allows you to run SQL queries over its data.Suppose a scenario that you are accessing data from a file from some location and you want to run SQL queries over this data. then you need to createDataframe from the Row RDD and you will register temporary table over this DataFrame to run the SQL operations. To perform SQL queries over that data, you need to use Spark SQLContext in your code.

  2. Both methods use exactly the same execution engine and internal data structures. At the end of the day all boils down to the personal preferences of the developer.

    Arguably DataFrame queries are much easier to construct programmatically and provide a minimal type safety.

    Plain SQL queries can be significantly more concise an easier to understand. There are also portable and can be used without any modifications with every supported language. With HiveContext these can be also used to expose some functionalities which can be inaccessible in other ways (for example UDF without Spark wrappers

    Reference: Spark sql queries vs dataframe functions

    Here is a good read reference on performance comparison between Spark RDDs vs DataFrames vs SparkSQL

  3. Apparently I don't have answer for it and will keep it on you to do some research over net and find out solution :)

Upvotes: 1

Related Questions