New column creation based on if and else condition using pyspark

Question

I have 2 spark dataframes, and I want to add new column named "seg" to dataframe df2 based on below condition

if df2.colx value is present in df1.colx.

I tried below operation in pyspark but its throwing exception.

cc002 = df2.withColumn('seg',F.when(df2.colx == df1.colx,"True").otherwise("FALSE"))

df1 :

id  colx  coly
1   678   56789
2   900   67890
3   789   67854

df2

Name   colx
seema  900
yash   678
deep   800
harsh  900

My expected Output is

Name  colx   seg
seema 900    True
harsh 900    True
yash  678    True
deep  800    False

Please help me correcting the given pyspark code or suggest the better way of doing it.

Ankit Kumar Namdeo · Accepted Answer

If I understand your question correctly what you want to do is this

res = df2.join(
    df1,
    on="colx",
    how = "left"
).select(
    "Name",
    "colx"
).withColumn(
    "seg",
    F.when(F.col(colx).isNull(),F.lit(True)).otherwise(F.lit(False))
)

let me know if this is the solution you want.

my bad i did write the incorrect code in hurry below is the corrected one

import pyspark.sql.functions as F

df1 = sqlContext.createDataFrame([[1,678,56789],[2,900,67890],[3,789,67854]],['id', 'colx', 'coly'])

df2 = sqlContext.createDataFrame([["seema",900],["yash",678],["deep",800],["harsh",900]],['Name', 'colx'])

res = df2.join(
    df1.withColumn(
        "check",
        F.lit(1)
    ),
    on="colx",
    how = "left"
).withColumn(
    "seg",
    F.when(F.col("check").isNotNull(),F.lit(True)).otherwise(F.lit(False))
).select(
    "Name",
    "colx",
    "seg"
)

res.show()

+-----+----+-----+
| Name|colx|  seg|
+-----+----+-----+
| yash| 678| true|
|seema| 900| true|
|harsh| 900| true|
| deep| 800|false|
+-----+----+-----+

New column creation based on if and else condition using pyspark

Answers (2)

Related Questions