Reputation: 923
I have dataframe like this:
column_1 column_2 column_3
[1,3] [2] 2
[1,2,3] null 1
[3,4] [6] 1
I want to append the value of column_2 to column_1 if it's not null and del column_2.
Desired output:
column_1 column_3
[1,3,2] 2
[1,2,3] 1
[3,4,6] 1
Upvotes: 0
Views: 353
Reputation: 24416
array_union
from pyspark.sql import functions as F
df = spark.createDataFrame(
[([1,3], [2], 2),
([1,2,3], None, 1),
([3,4], [6], 1)],
['column_1', 'column_2', 'column_3']
)
df = df.select(
F.when(F.isnull('column_2'), F.col('column_1')).otherwise(F.array_union('column_1', 'column_2')).alias('column_1'),
'column_3'
)
df.show()
# +---------+--------+
# | column_1|column_3|
# +---------+--------+
# |[1, 3, 2]| 2|
# |[1, 2, 3]| 1|
# |[3, 4, 6]| 1|
# +---------+--------+
Upvotes: 1