Simi J
Simi J

Reputation: 1

How to convert array of struct of struct into string in pyspark

root
 |-- id: long (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- resource: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- alias: string (nullable = true)
 |    |-- id: string (nullable = true)
 |-- school: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- teacher: struct (nullable = true)
 |    |    |    |-- sys_id: string (nullable = true)
 |    |    |    |-- ip: string (nullable = true)
 |    |    |-- Partition: string (nullable = true)

to 

root
 |-- id: long (nullable = true)
 |-- person: struct (nullable = true)
 |    |-- resource: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- alias: string (nullable = true)
 |    |-- id: string (nullable = true)
 |-- school: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- teacher: string (nullable = true)
 |    |    |-- Partition: string (nullable = true)

i want to convert teacher into a string in pyspark

i tried using functions.transform and then a with field on the struct teacher , but always gets an error with the below AnalysisException: cannot resolve 'update_fields(school, WithField(concat_ws(',', 'teacher.*')))' due to data type mismatch: struct argument should be struct type, got: array<structteacher:struct<sys_id:string,ip:string,Partition:string>>;

    df1 = df1.withColumn("school", 
functions.transform(functions.col("school").withField("teacher", functions.expr("concat_ws(',', 'teacher.*')")),lambda x: x.cast("string")))

Upvotes: 0

Views: 424

Answers (1)

Simi J
Simi J

Reputation: 1

df1 = df1.withColumn("school", functions.transform(functions.col("school"),
                                                                      lambda x: x.withField("teacher",x['teacher'].cast('string')))) 

worked for me

Upvotes: 0

Related Questions