Reputation: 411
I have a table with a column of array of structs with this schema:
root
|-- id: long (nullable = true)
|-- mainColumn: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- fieldA: string (nullable = true)
| | |-- fieldB: string (nullable = true)
| | |-- fieldC: string (nullable = true)
| | |-- fieldD: string (nullable = true)
|-- randomOtherColumn: short (nullable = true)
I need to create a query that returns the entire table but applies a function to the fieldA
of every entry (an encryption function). I have tried using the transform()
function but I also need to return the other fields (fieldB
, fieldC
, etc).
SELECT
x.id,
x.randomOtherColumn,
transform(y -> ???)
FROM
my_table x
Something like this would be simple in JavaScript with the spread operator:
df.map(x => (
{
x.id,
x.randomOtherColumn,
x.mainColumn.map(y => ({ ...y, fieldA: encrypt(y.fieldA) }))
}
))
Upvotes: 2
Views: 411
Reputation: 32660
You need to recreate the inner struct elements of the array like this:
SELECT
id,
randomOtherColumn,
transform(mainColumn, x -> struct(encrypt(fieldA) as 'fieldA', fieldB as 'fieldB', fieldC as 'fieldC', fieldD as 'fieldD')) as mainColumn
FROM
my_table
Or using DataFrame API with column method withField
(since Spark 3.1+):
val df1 = df.withColumn(
"mainColumn",
transform(col("mainColumn"), x => x.withField("fieldA", encrypt(x("fieldA"))))
)
As you also tagged pyspark:
from pyspark.sql import functions as F
df1 = df.withColumn(
"mainColumn",
F.transform("mainColumn", lambda x: x.withField("fieldA", encrypt(x["fieldA"])))
)
Upvotes: 2