Reputation: 1
I have two Dataframe and I want when the column CONCERN of Dataframe2 contains 'all' the anwser in the new column "EFFECTIFITY" (in the same dataframe) is a list off all the serial number "SN" of the column "SN" in the Dataframe1
df1 = Dataframe1 df2 = Dataframe2
all_data = df1.select(collect_list("SN")).show()
df = df.withColumn("EFFECTIVITY", F.when(df2.CONCERN.contains('ALL'), all_data).otherwise(''))
Upvotes: 0
Views: 49
Reputation: 5
check below scenario. it may solve your problem,
from pyspark.sql.functions import collect_list, when
# create list and collect all the SN values from df1 into a list
all_data = df1.select(collect_list("SN")).first()[0]
df2 = df2.withColumn("EFFECTIVITY", when(df2.CONCERN.contains('ALL'), all_data).otherwise([]))
Upvotes: 0