Reputation: 1
I have two structs
email_struct:
|-- email_struct: struct (nullable = true)
| |-- UsageTypeDesc: string (nullable = true)
| |-- ContactInfo: struct (nullable = false)
| | |-- ElectronicAddress: struct (nullable = false)
| | | |-- AddressSubtype: string (nullable = false)
| | | |-- SourceSystemTypeDesc: string (nullable = false)
| | | |-- ElectronicAddressTxt: string (nullable = true)
phone_struct:
|-- phone_struct: struct (nullable = true)
| |-- UsageTypeDesc: string (nullable = true)
| |-- ContactInfo: struct (nullable = false)
| | |-- TelephoneNumber: struct (nullable = false)
| | | |-- AddressSubtype: string (nullable = false)
| | | |-- SourceSystemTypeDesc: string (nullable = false)
| | | |-- TelephoneNum: string (nullable = true)
How can I create array of struct in pyspark
"ContactInfo": [
{
"UsageTypeDesc": "PHONE",
"ContactInfo": {
"TelephoneNumber": {
"AddressSubtype": "TELEPHONE NUMBER",
"SourceSystemTypeDesc": "",
"TelephoneNum": ""
}
}
},
{
"UsageTypeDesc": "EMAIL",
"ContactInfo": {
"ElectronicAddress": {
"AddressSubtype": "EMAIL ADDRESS",
"SourceSystemTypeDesc": "",
"ElectronicAddressTxt": ""
}
}
}
]
Error :
pyspark.errors.exceptions.captured.AnalysisException: [DATATYPE_MISMATCH.DATA_DIFF_TYPES] Cannot resolve "array(phone_struct, email_struct)" due to data type mismatch: Input to array
should all be the same type, but it's ("STRUCT<UsageTypeDesc: STRING, ContactInfo: STRUCT<TelephoneNumber: STRUCT<AddressSubtype: STRING, SourceSystemTypeDesc: STRING, TelephoneNum: STRING>>>" or "STRUCT<UsageTypeDesc: STRING, ContactInfo: STRUCT<ElectronicAddress: STRUCT<AddressSubtype: STRING, SourceSystemTypeDesc: STRING, ElectronicAddressTxt: STRING>>>").;
I am getting following error If I use following :
df = df.withColumn("ContactInfo", array(
col("phone_struct")),
col("email_struct")),
)
)
Upvotes: 0
Views: 85
Reputation: 19610
You cannot create an ArrayType
from two fields that have different schemas. I am not sure about your use case, but you could instead combine the two fields in another struct:
df.withColumn("ContactInfo", F.struct(
F.col("phone_struct"),
F.col("email_struct"),
))
Upvotes: 0