David
David

Reputation: 103

How to use tranform API in spark3.0.0?

As you know, transform API has been integrated in Spark3.0.0, but I have tired and don't know how to use it and can't Google any usages. Can anyone give me a usage? Than you!

What I have tired:

    val source = spark.read.format("json").option("multiLine", "true").load("/home/user/Desktop/test.json")
    source.select(transform($"array0",x =>struct($"x.a".as("A")) ))
org.apache.spark.sql.AnalysisException: cannot resolve '`x.a`' given input columns: [array0];;
'Project [transform(array0#0, lambdafunction(named_struct(NamePlaceholder, 'x.a), lambda x#4, false)) AS transform(array0, lambdafunction(named_struct(NamePlaceholder(), x.a AS `A`), x))#3]
+- RelationV2[array0#0] json file:/home/usr/Desktop/test.json

my source json:

{
    "array0":[
        {
            "a":"0",
            "b":"1"
        }
    ]
}

Upvotes: 0

Views: 68

Answers (1)

David Vrba
David Vrba

Reputation: 3344

If you mean the higher order function transform used with arrays, here is a simple working example:

val df = spark.range(2).withColumn("arr", array(lit(1), lit(2)))

df.withColumn("x", transform($"arr", x => x + 1)).show()

+---+------+------+
| id|   arr|     x|
+---+------+------+
|  0|[1, 2]|[2, 3]|
|  1|[1, 2]|[2, 3]|
+---+------+------+

In your example since you have structs inside the array, you can access the elements of the struct as follows:

df.withColumn("x", transform($"arr", x => x.getItem("a") + 1))

Upvotes: 1

Related Questions