Reputation: 3267
I have two columns in a Spark dataframe: one is a String, and the other is a List of Strings. How do I create a new column that is the concatenation of the String in column one with each element of the list in column 2, resulting in another list in column 3.
For example, if column 1 is "a", and column 2 is ["A","B"], I'd like the output in column 3 of the dataframe to to be ["aA","aB"].
So far, I have:
val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
val multiplierUDF = udf(multiplier)
val df2 = df1
.withColumn("col3", multiplierUDF(df1("col1"),df1("col2")))
which gives aWrappedArray(A,B)
Upvotes: 3
Views: 5364
Reputation: 2224
I suggest you try your udf functions outside of spark, and get them working for local variables first. If you do:
val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
multiplier("a", Seq("A", "B"))
// output
res1: String = aList(A, B)
You'll see multiplier
doesn't do what you want.
I think you're looking for:
val multiplier = (x1: String, x2: Seq[String]) => x2.map(x1+_)
multiplier("a", Seq("A", "B"))
//output
res2: Seq[String] = List(aA, aB)
Upvotes: 4
Reputation: 18042
I think you should redefine your UDF
to something similar to my function append
val a = Seq("A", "B")
val p = "a"
def append(init: String, tails: Seq[String]) = tails.map(x => init + x)
append(p, a)
//res1: Seq[String] = List(aA, aB)
Upvotes: 2