W.R
W.R

Reputation: 11

Update Table Hive Using Spark Scala

I need to update a Table Hive like

update A from B
    set 
        Col5 = A.Col2, 
        Col2 =   B.Col2, 
        DT_Change = B.DT, 
        Col3 = B.Col3, 
        Col4 = B.Col4
where A.Col1 = B.Col1  and  A.Col2 <> B.Col2

Using Scala Spark RDD

How can I do this ?

Upvotes: 0

Views: 6475

Answers (1)

Satya
Satya

Reputation: 41

I want to split this question in to two questions to explain it simple. First question : How to write Spark RDD data to Hive table ? The simplest way is to convert the RDD in to Spark SQL (dataframe) using method rdd.toDF(). Then register the dataframe as temptable using df.registerTempTable("temp_table"). Now you can query from the temptable and insert in to hive table using sqlContext.sql("insert into table my_table select * from temp_table"). Second question: How to update Hive table from Spark ? As of now, Hive is not a best fit for record level updates. Updates can only be performed on tables that support ACID. One primary limitation is only ORC format supports updating Hive tables. You can find some information on it from https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions You can refer How to Updata an ORC Hive table form Spark using Scala for this.

Few methods might have deprecated with spark 2.x and you can check spark 2.0 documentation for the latest methods. While there could be better approaches, this is the simplest approach that I can think of which works.

Upvotes: 1

Related Questions