Devanshu Misra
Devanshu Misra

Reputation: 813

Fill null values with new elements in pyspark df

I am trying to find a way to join two df that have a certain colName common amongst them, and fill the null values with new values by using a func()

I have two DFs

DF1:

+---------------+                                                               
|        colName|
+---------------+
|              a|
|              b|
|              c|
|              d|
+---------------+

and DF2:

+-------+-------------+
|colName|         uuid|
+-------+-------------+
|      a|            1|
|      b|            2|
+-------+-------------+   

After using a specific pyspark funciton like DF1.join(DF2, where(DF1.colName == DF2.colName)) or df.merge() I'd like to obtain a df with the following values

DF1 + DF2:

+-------+-------------+
|colName|         uuid|
+-------+-------------+
|      a|            1|
|      b|            2|
|      c|         null|
|      d|         null|
+-------+-------------+   

and then using a func():

def func():
    doSomethin(DF1+DF2)

The following result is expected:

+-------+-------------+
|colName|         uuid|
+-------+-------------+
|      a|            1|
|      b|            2|
|      c|            3|
|      d|            4|
+-------+-------------+   

As you can see, DF1 just has colName, whereas DF2 has uuid associated. So func() will actually generate a uuid and fill distinct colName with distinct uuid. Any approach towards this would be appreciated. Thanks in advance.

Upvotes: 0

Views: 244

Answers (1)

Lamanus
Lamanus

Reputation: 13541

You can do this with the uuid function simply.

import pyspark.sql.functions as f

df1.join(df2, ['colName'], 'left') \
   .withColumn('uuid', f.coalesce(f.col('uuid'), f.expr('uuid()'))) \
   .show(10, False)

+-------+------------------------------------+
|colName|uuid                                |
+-------+------------------------------------+
|a      |1                                   |
|b      |2                                   |
|c      |ad037b01-4be8-40f2-99fc-b7589b2c648c|
|d      |e59c9315-b626-4dcf-a80a-e16f08f19a96|
+-------+------------------------------------+

Upvotes: 1

Related Questions