Feynman27
Feynman27

Reputation: 3267

Concatenate String to each element of a List in a Spark dataframe with Scala

I have two columns in a Spark dataframe: one is a String, and the other is a List of Strings. How do I create a new column that is the concatenation of the String in column one with each element of the list in column 2, resulting in another list in column 3.

For example, if column 1 is "a", and column 2 is ["A","B"], I'd like the output in column 3 of the dataframe to to be ["aA","aB"].

So far, I have:

val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
val multiplierUDF = udf(multiplier)
val df2 = df1
  .withColumn("col3", multiplierUDF(df1("col1"),df1("col2")))

which gives aWrappedArray(A,B)

Upvotes: 3

Views: 5364

Answers (2)

Alfredo Gimenez
Alfredo Gimenez

Reputation: 2224

I suggest you try your udf functions outside of spark, and get them working for local variables first. If you do:

val multiplier = (x1: String, x2: Seq[String]) => {x1+x2}
multiplier("a", Seq("A", "B"))

// output
res1: String = aList(A, B)

You'll see multiplier doesn't do what you want.

I think you're looking for:

val multiplier = (x1: String, x2: Seq[String]) => x2.map(x1+_)
multiplier("a", Seq("A", "B"))

//output
res2: Seq[String] = List(aA, aB)

Upvotes: 4

Alberto Bonsanto
Alberto Bonsanto

Reputation: 18042

I think you should redefine your UDF to something similar to my function append

val a = Seq("A", "B")
val p = "a"

def append(init: String, tails: Seq[String]) = tails.map(x => init + x)

append(p, a)

//res1: Seq[String] = List(aA, aB)

Upvotes: 2

Related Questions