reachify
reachify

Reputation: 3827

Concat column values in a dataframe

I have a csv file that looks like the following

name, state, a, b, c, d, ..., x
Jon,  NY,    1, 4, 6, 2,      6
Eric, CA,    5, 3, 1, 5,      6
Chris,LA,    4, 4, 3, 1,      5

and I want the following result (one column with concat fields (inkl header name))

concate-fields
"name=Jon,   state=NY, a=1, b=4, c=6, d= 2, ... x=6"
"name=Eric,  state=CA, a=5, b=3, c=1, d= 5, ... x=6"
"name=Chris, state=LA, a=4, b=4, c=3, d= 1, ... x=5"

There can be many headers from a...>x so these should be appended in a generic way

I now have

import org.apache.spark.sql.functions.{concat, lit}
val lp = sample.select(concat(lit("name), $"name", lit(",state="), $"state")
display(lp)

But I have trouble adding the same for column a->x (as this needs to be done in a generic way)

Upvotes: 2

Views: 176

Answers (1)

Mohana B C
Mohana B C

Reputation: 5487

You can dynamically create SQL expression to concat columns by calling map method on df.columns() as shown below.

val df = // Read CSV

df.withColumn("concate-fields", expr(s"concat(${df.columns.map(col=>s"'$col=', nvl($col,'null'),','").mkString("").dropRight(4)})"))
    .withColumn("concate-fields", concat(lit("\""),col("concate-fields"),lit("\"")))

Upvotes: 1

Related Questions