user3436624
user3436624

Reputation: 764

writing a UDF in spark sql with scala

I'm writing a UDF in spark SQL and I'm wondering whether there is a place I can read documentation about exactly what is versus what isn't possible in this regard? Or a tutorial? I'm using SQLContext, not HiveContext.

The examples I've seen typically involve passing in a string, transforming it, and then outputting some transformed string of other object which I've managed to do successfully. But what if one wanted to pass in an input that was really some kind of Spark SQL Row object, i.e., or a List of Row objects, each of which have fields with key-value pairs, etc. In my case I'm passing in a List of Row objects by telling the UDF the input is List[Map[String, Any]]. I think the issue is partly that its really some kind of GenericRowWithSchema object not a List or Array.

Also, I noticed the LATERAL VIEW with explode option. I think this would in theory work for my case, but it didn't work for me. I think it may be because I am not using a HiveContext but I can't change that.

Upvotes: 2

Views: 2581

Answers (1)

agsachin
agsachin

Reputation: 606

What I got from question is first you want to read a row in UDF

Define the UDF

def compare(r:Row) = {r.get(0)==r.get(1)} 

Register the UDF

sqlContext.udf.register("compare", compare _)

Create the dataFrame

val TestDoc = sqlContext.createDataFrame(Seq(("sachin", "sachin"), ("aggarwal", "aggarwal1"))).toDF("text", "text2")

Use the UDF

scala> TestDoc.select($"text", $"text2",callUdf("compare",struct($"text",$"text2")).as("comparedOutput")).show

Result:

+--------+---------+--------------+
|    text|    text2|comparedOutput|
+--------+---------+--------------+
|  sachin|   sachin|          true|
|aggarwal|aggarwal1|         false|
+--------+---------+--------------+

Second question is about LATERAL VIEW with explode option, better to use HiveContext

Upvotes: 6

Related Questions