2713
2713

Reputation: 195

Selecting struct type columns with '.' in the column name in PySpark

How do I select the "cat.item.category" columns in PySpark? The schema is as follows,

root
 |-- result: struct (nullable = true)
 |    |-- active: string (nullable = true)
 |    |-- cat_item.category: struct (nullable = true)
 |    |    |-- display_value: string (nullable = true)
 |    |    |-- link: string (nullable = true)
 |    |-- number: string (nullable = true)
 |    |-- sys_id: string (nullable = true)

I tried the following but I get an error,

df22 = df22.select("result.active", "result.cat_item.category.display_value", "result.cat_item.category.link", "result.number", "result.sys_id")

How do I select the struct columns?

Upvotes: 0

Views: 402

Answers (1)

blackbishop
blackbishop

Reputation: 32710

The field name contains a dot ., you need to escape it using backtick `:

df22.select("result.`cat_item.category`.display_value")

Upvotes: 1

Related Questions