Reputation: 18108
This works provided no null values exist in an array passed to a pyspark UDF.
concat_udf = udf(
lambda con_str, arr: [x + con_str for x in arr], ArrayType(StringType())
)
I am not seeing how we can adapt this with a null / None check with an If. How to adapt the following correctly below that does not work:
concat_udf = udf(lambda con_str, arr: [ if x is None: 'XXX' else: x + con_str for x in arr ], ArrayType(StringType()))
I can find no such example. if with transform
no success either.
+----------+--------------+--------------------+
| name|knownLanguages| properties|
+----------+--------------+--------------------+
| James| [Java, Scala]|[eye -> brown, ha...|
| Michael|[Spark, Java,]|[eye ->, hair -> ...|
| Robert| [CSharp, ]|[eye -> , hair ->...|
|Washington| null| null|
| Jefferson| [1, 2]| []|
+----------+--------------+--------------------+
should become
+----------+--------------------+-----------------------+
| name|knownLanguages| properties |
+----------+--------------------+-----------------------+
| James| [JavaXXX, ScalaXXX]|[eye -> brown, ha... |
| Michael|[SparkXXX, JavaXXX,XXX]|[eye ->, hair -> ...|
| Robert| [CSharpXXX, XXX]|[eye -> , hair ->... |
|Washington| XXX| null |
| Jefferson| [1XXX, 2XXX]| [] |
+----------+--------------+-----------------------------+
Upvotes: 0
Views: 1084
Reputation: 15318
using ternary operator, I would do something like this :
concat_udf = udf(
lambda con_str, arr: [x + con_str if x is not None else "XXX" for x in arr]
if arr is not None
else ["XXX"],
ArrayType(StringType()),
)
# OR
concat_udf = udf(
lambda con_str, arr: [
x + con_str if x is not None else "XXX" for x in arr or [None]
],
ArrayType(StringType()),
)
Upvotes: 1