MGM
MGM

Reputation: 27

Variable substitution for column names in hive query

I have a task where in I need to compare 2 columns of a dataframe and get the differences.There are 200+ columns in the dataframe and i have to write 100+ queries to check the values in columns. eg: DF1 https://i.sstatic.net/Aj1ca.png

I need all the values where X1 = X2 and column pairs have different value. In simple terms-

select A1,A2 from DF1 where X1=X2 and A1!=A2
select B1,B2 from DF1 where X1=X2 and B1!=B2
select C1,C2 from DF1 where X1=X2 and C1!=C2

Now as I have 100+ columns so I have to write 100+ such queries. So I wanted to write a function in scala where in I would just pass the column names(A1,A2 or B1,B2, etc) which would be substituted in the hive query.

 def comp_col(a:Any, b:Any):Any= {
var ret = sqlc.sql("SELECT $a, $b from DF1 WHERE X1= X2 $a!= $b");
return ret;

}

Is there anyway wherein the query in function would take the column names from the variables that I pass. Any different approach is also welcome.

Thanks in Advance.

Upvotes: 0

Views: 822

Answers (1)

Ramandeep Nanda
Ramandeep Nanda

Reputation: 519

Yes, Use string interpolation for scala.

def comp_col(a:Any, b:Any):Any= {
var ret = sqlc.sql(s"SELECT $a, $b from DF1 WHERE X1= X2 $a!= $b");
return ret;
}

Upvotes: 0

Related Questions