milad ahmadi
milad ahmadi

Reputation: 545

How to pass elements of a list to concat function?

I am currently using the following approach to concat the columns in a dataframe:

val Finalraw = raw.withColumn("primarykey", concat($"prod_id",$"frequency",$"fee_type_code"))

But the thing is that I do not want to hardcode the columns as the number of columns is changing everytime. I have a list that consists the column names:

columnNames: List[String] = List("prod_id", "frequency", "fee_type_code")

So, the question is how to pass the list elements to the concat function instead of hardcoding the column names?

Upvotes: 0

Views: 1498

Answers (2)

stack0114106
stack0114106

Reputation: 8711

Map the list elements to List[org.apache.spark.sql.Column] in a separate variable. Check this out.

scala> val df = Seq(("a","x-","y-","z")).toDF("id","prod_id","frequency","fee_type_code")
df: org.apache.spark.sql.DataFrame = [id: string, prod_id: string ... 2 more fields]

scala> df.show(false)
+---+-------+---------+-------------+
|id |prod_id|frequency|fee_type_code|
+---+-------+---------+-------------+
|a  |x-     |y-       |z            |
+---+-------+---------+-------------+


scala> val arr = List("prod_id", "frequency", "fee_type_code")
arr: List[String] = List(prod_id, frequency, fee_type_code)

scala> val arr_col = arr.map(col(_))
arr_col: List[org.apache.spark.sql.Column] = List(prod_id, frequency, fee_type_code)

scala> df.withColumn("primarykey",concat(arr_col:_*)).show(false)
+---+-------+---------+-------------+----------+
|id |prod_id|frequency|fee_type_code|primarykey|
+---+-------+---------+-------------+----------+
|a  |x-     |y-       |z            |x-y-z     |
+---+-------+---------+-------------+----------+


scala>

Upvotes: 1

Shaido
Shaido

Reputation: 28367

The concat function takes multiple columns as input while you have a list of strings. You need to transform the list to fit the method input.

First, use map to transform the strings into column objects and then unpack the list with :_* to correctly pass the arguments to concat.

val Finalraw = raw.withColumn("primarykey", concat(columnNames.map(col):_*))

For an explaination of the :_* syntax, see What does `:_*` (colon underscore star) do in Scala?

Upvotes: 2

Related Questions