Reputation: 101
I have a dataframe with the following schema:
root
|-- docnumber: string (nullable = true)
|-- event: struct (nullable = false)
| |-- data: struct (nullable = true)
|-- codevent: int (nullable = true)
I need to add a column inside event.data
so that the schema would be like:
root
|-- docnumber: string (nullable = true)
|-- event: struct (nullable = false)
| |-- data: struct (nullable = true)
|-- codevent: int (nullable = true)
|-- needtoaddit: int (nullable = true)
I tried
dataframe.withColumn("event.data.needtoaddit", lit("added"))
but it adds a column with name event.data.needtoaddit
dataframe.withColumn(
"event",
struct(
$"event.*",
struct(
lit("added")
.as("needtoaddit")
).as("data")
)
)
but it creates an ambiguous column named event.data
and again I have a problem.
How can I make it work?
Upvotes: 2
Views: 1425
Reputation: 24356
Spark 3.1+
To add fields inside struct columns, use withField
col("event.data").withField("needtoaddit", lit("added"))
Input:
val df = spark.createDataFrame(Seq(("1", 2)))
.select(
col("_1").as("docnumber"),
struct(struct(col("_2").as("codevent")).as("data")).as("event")
)
df.printSchema()
// root
// |-- docnumber: string (nullable = true)
// |-- event: struct (nullable = false)
// | |-- data: struct (nullable = false)
// | | |-- codevent: long (nullable = true)
Script:
val df2 = df.withColumn(
"event",
col("event.data").withField("needtoaddit", lit("added"))
)
df2.printSchema()
// root
// |-- docnumber: string (nullable = true)
// |-- event: struct (nullable = false)
// | |-- data: struct (nullable = true)
// |-- codevent: int (nullable = true)
// |-- needtoaddit: int (nullable = true)
Upvotes: 3
Reputation: 42342
You're kind of close. Try this code:
val df2 = df.withColumn(
"event",
struct(
struct(
$"event.data.*",
lit("added").as("needtoaddit")
).as("data")
)
)
Upvotes: 1