Reputation: 764
I'm writing a UDF in spark SQL and I'm wondering whether there is a place I can read documentation about exactly what is versus what isn't possible in this regard? Or a tutorial? I'm using SQLContext, not HiveContext.
The examples I've seen typically involve passing in a string, transforming it, and then outputting some transformed string of other object which I've managed to do successfully. But what if one wanted to pass in an input that was really some kind of Spark SQL Row object, i.e., or a List of Row objects, each of which have fields with key-value pairs, etc. In my case I'm passing in a List of Row objects by telling the UDF the input is List[Map[String, Any]]. I think the issue is partly that its really some kind of GenericRowWithSchema object not a List or Array.
Also, I noticed the LATERAL VIEW with explode option. I think this would in theory work for my case, but it didn't work for me. I think it may be because I am not using a HiveContext but I can't change that.
Upvotes: 2
Views: 2581
Reputation: 606
What I got from question is first you want to read a row in UDF
Define the UDF
def compare(r:Row) = {r.get(0)==r.get(1)}
Register the UDF
sqlContext.udf.register("compare", compare _)
Create the dataFrame
val TestDoc = sqlContext.createDataFrame(Seq(("sachin", "sachin"), ("aggarwal", "aggarwal1"))).toDF("text", "text2")
Use the UDF
scala> TestDoc.select($"text", $"text2",callUdf("compare",struct($"text",$"text2")).as("comparedOutput")).show
Result:
+--------+---------+--------------+
| text| text2|comparedOutput|
+--------+---------+--------------+
| sachin| sachin| true|
|aggarwal|aggarwal1| false|
+--------+---------+--------------+
Second question is about LATERAL VIEW with explode option, better to use HiveContext
Upvotes: 6