Masterbuilder
Masterbuilder

Reputation: 509

Select all columns at runtime spark sql, without predefined schema

I have a dataframe with values which are in the format

|resourceId|resourceType|seasonId|seriesId|
+----------+------------+--------+--------+
|1234      |cM-type     |883838  |8838832 |
|1235      |cM-type     |883838  |8838832 |
|1236      |cM-type     |883838  |8838832 |
|1237      |CNN-type    |883838  |8838832 |
|1238      |cM-type     |883838  |8838832 |
+----------+------------+--------+--------+

I want to convert the dataframe into this format

+----------+----------------------------------------------------------------------------------------+
|resourceId|value                                                                                   |
+----------+----------------------------------------------------------------------------------------+
|1234      |{"resourceId":"1234","resourceType":"cM-type","seasonId":"883838","seriesId":"8838832"} |
|1235      |{"resourceId":"1235","resourceType":"cM-type","seasonId":"883838","seriesId":"8838832"} |
|1236      |{"resourceId":"1236","resourceType":"cM-type","seasonId":"883838","seriesId":"8838832"} |
|1237      |{"resourceId":"1237","resourceType":"CNN-type","seasonId":"883838","seriesId":"8838832"}|
|1238      |{"resourceId":"1238","resourceType":"cM-type","seasonId":"883838","seriesId":"8838832"} |
+----------+----------------------------------------------------------------------------------------+

I know I can get the desired output by giving the fields manually like this

val jsonformated=df.select($"resourceId",to_json(struct($"resourceId", $"resourceType", $"seasonId",$"seriesId")).alias("value"))

However, I am trying to pass the column values to struct programmatic, using

val cols = df.columns.toSeq
val jsonformatted=df.select($"resourceId",to_json(struct("colval",cols)).alias("value"))

some reason struct function doesn't take the sequence, from the api it looks like there is a method signature to accept sequence,

struct(String colName, scala.collection.Seq<String> colNames)

is there a better solution to solve this problem.

Update:

As the answer pointed out the exact syntax to get the output

val colsList = df.columns.toList
 val column: List[Column] = colsList.map(dftrim(_))
 val jsonformatted=df.select($"resourceId",to_json(struct(column:_*)).alias("value"))

Upvotes: 1

Views: 474

Answers (1)

user10454119
user10454119

Reputation: 46

struct takes a sequence. You're just looking at a wrong variant. Use

def struct(cols: Column*): Column 

such as

import org.apache.spark.sql.functions._

val cols: Seq[String] = ???

struct(cols map col: _*)

Upvotes: 2

Related Questions