Reputation: 507
I have a df with rows as:
account_number | null
additional_info | null
data | [,,,,,,,,,,,,,,,,...
dealer_id | null
device_id | 0
device_name | null
device_type | panel
event | setting
event_class | panel_settings_alarm
event_desc | siren_disable is ...
event_timestamp | 1621341705282
event_type | siren_disable
imei | 99000000000000
is_event_desc_available | true
message_uuid | b32e6a3f-bced-4ba...
operation | report
partition_id | 0
priority | 2
software_version | 2.4.2
source | panel
setting_name | siren_disable
the nested column data has multiple key value pairs. With Key as the setting_name column I need to make another column with value as data[setting_name] for eg: data['siren_disable'] The names in setting name keep changing. I tried:
df2 = df2.withColumn("setting_value",df2["data"].getItem(df2['setting_name']))
But getting error:
AnalysisException: Field name should be String Literal, but it's setting_name#3864;
Any suggestions?
Upvotes: 0
Views: 46
Reputation: 42352
One possible way is to convert the struct type column to map type:
import pyspark.sql.functions as F
df2 = df.withColumn(
'setting_value',
F.from_json(F.to_json('data'), 'map<string,string>')[F.col('setting_name')]
)
Upvotes: 1