Reputation: 11
I need to update
a Table Hive like
update A from B
set
Col5 = A.Col2,
Col2 = B.Col2,
DT_Change = B.DT,
Col3 = B.Col3,
Col4 = B.Col4
where A.Col1 = B.Col1 and A.Col2 <> B.Col2
Using Scala Spark RDD
How can I do this ?
Upvotes: 0
Views: 6475
Reputation: 41
I want to split this question in to two questions to explain it simple.
First question : How to write Spark RDD data to Hive table ?
The simplest way is to convert the RDD in to Spark SQL (dataframe) using method rdd.toDF()
. Then register the dataframe as temptable using df.registerTempTable("temp_table")
. Now you can query from the temptable and insert in to hive table using sqlContext.sql("insert into table my_table select * from temp_table")
.
Second question: How to update Hive table from Spark ?
As of now, Hive is not a best fit for record level updates. Updates can only be performed on tables that support ACID. One primary limitation is only ORC format supports updating Hive tables. You can find some information on it from https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions
You can refer How to Updata an ORC Hive table form Spark using Scala for this.
Few methods might have deprecated with spark 2.x and you can check spark 2.0 documentation for the latest methods. While there could be better approaches, this is the simplest approach that I can think of which works.
Upvotes: 1