Reputation: 331
I have dataframe like below
id contact_persons
-----------------------
1 [[abc, [email protected], 896676, manager],[pqr, [email protected], 89809043, director],[stu, [email protected], 09909343, programmer]]
schema looks like this.
root
|-- id: string (nullable = true)
|-- contact_persons: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: string (containsNull = true)
i need to convert this dataframe like below schema.
root
|-- id: string (nullable = true)
|-- contact_persons: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- emails: string (nullable = true)
| | |-- name: string (nullable = true)
| | |-- phone: string (nullable = true)
| | |-- roles: string (nullable = true)
I know there is struct function in pyspark, but in this scenario, i dont know how to use this as array is dynamic sized.
Upvotes: 1
Views: 501
Reputation: 4069
You can use TRANSFORM
expression to cast it:
import pyspark.sql.functions as f
df = spark.createDataFrame([
[1, [['abc', '[email protected]', '896676', 'manager'],
['pqr', '[email protected]', '89809043', 'director'],
['stu', '[email protected]', '09909343', 'programmer']]]
], schema='id string, contact_persons array<array<string>>')
expression = 'TRANSFORM(contact_persons, el -> STRUCT(el[0] AS name, el[1] AS emails, el[2] AS phone, el[3] AS roles))'
output_df = df.withColumn('contact_persons', f.expr(expression))
# output_df.printSchema()
# root
# |-- id: string (nullable = true)
# |-- contact_persons: array (nullable = true)
# | |-- element: struct (containsNull = false)
# | | |-- name: string (nullable = true)
# | | |-- emails: string (nullable = true)
# | | |-- phone: string (nullable = true)
# | | |-- roles: string (nullable = true)
output_df.show(truncate=False)
+---+-----------------------------------------------------------------------------------------------------------------------+
|id |contact_persons |
+---+-----------------------------------------------------------------------------------------------------------------------+
|1 |[{abc, [email protected], 896676, manager}, {pqr, [email protected], 89809043, director}, {stu, [email protected], 09909343, programmer}]|
+---+-----------------------------------------------------------------------------------------------------------------------+
Upvotes: 2