Reputation: 8449
org.apache.spark.functions.transform applies a function to each element of an array (new in spark 3.0) However, the pyspark docs don't mention an equivalent function
(there's pyspark.sql.DataFrame.transform - but it's for transforming DataFrames, not array elements)
Upvotes: 0
Views: 230
Reputation: 42342
EDIT:
To avoid UDFs, you can use F.expr('transform ...'):
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
df = spark.createDataFrame([[[1,2]],[[3,4]]]).toDF('col')
df.show()
+------+
| col|
+------+
|[1, 2]|
|[3, 4]|
+------+
df.select(F.expr('transform(col, x -> x+1)').alias('transform')).show()
+---------+
|transform|
+---------+
| [2, 3]|
| [4, 5]|
+---------+
Old answer:
IIUC, I think the equivalent is UDF. x+1
is the function to be applied.
import pyspark.sql.functions as F
from pyspark.sql.types import IntegerType
add = F.udf(lambda arr: [x+1 for x in arr], ArrayType(IntegerType()))
df.select(add('col')).show()
+-------------+
|<lambda>(col)|
+-------------+
| [2, 3]|
| [4, 5]|
+-------------+
Upvotes: 1