Reputation: 1530
I have a dataframe df
with the following schema:
root
|-- city_name: string (nullable = true)
|-- person: struct (nullable = true)
| |-- age: long (nullable = true)
| |-- name: string (nullable = true)
What I want to do is add a nested column, say car_brand
to my person
structure. How would I do it?
The expected final schema would look like this:
root
|-- city_name: string (nullable = true)
|-- person: struct (nullable = true)
| |-- age: long (nullable = true)
| |-- name: string (nullable = true)
| |-- car_brand: string (nullable = true)
Upvotes: 3
Views: 7666
Reputation: 347
The function withField is available starting Spark 3.1. As per the doc, "it can be used to add/replace a nested field in StructType
by name".
In this case, it can be used as follows: -
import org.apache.spark.sql.functions
df.withColumn("person", functions.col("person").withField("car_brand", functions.col("some car brand here")))
Upvotes: 1
Reputation: 36
import pyspark.sql.functions as func
dF = dF.withColumn(
"person",
func.struct(
"person.age",
func.struct(
"person.name",
func.lit(None).alias("NestedCol_Name")
).alias("name")
)
)
O/P Schema:-
root
|-- city_name: string (nullable = true)
|-- person: struct (nullable = false)
| |-- age: string (nullable = true)
| |-- name: struct (nullable = false)
| | |-- name: string (nullable = true)
| | |-- NestedCol_Name: null (nullable = true)
Upvotes: 1
Reputation: 732
Adding a new nested column within person:
df = df.withColumn(
"person",
struct(
$"person.*",
struct(
lit("value_1").as("person_field_1"),
lit("value_2").as("person_field_2"),
).as("nested_column_within_person")
)
)
Final schema :
root
|-- city_name: string (nullable = true)
|-- person: struct (nullable = true)
| |-- age: long (nullable = true)
| |-- name: string (nullable = true)
| |-- nested_column_within_person: struct (nullable = true)
| | |-- person_field_1: string (nullable = true)
| | |-- person_field_2: string (nullable = true)
Upvotes: 2
Reputation: 28322
You can unpack the struct and add it to a new one, including the new column at the same time. For example, adding "bmw" to all persons in the dataframe be done like this:
df.withColumn("person", struct($"person.*", lit("bmw").as("car_brand")))
Upvotes: 5