Gravity Boy
Gravity Boy

Reputation: 7

How to replace null values in a dataframe based on values in other dataframe?

Here's a dataframe, df1, I have

+---------+-------+---------+
|   C1    |   C2  |   C3    |
+---------+-------+---------+
|    xr   |   1   |   ixfg  |
|    we   |   5   |   jsfd  |
|    r5   |   7   |   hfga  |
|    by   |   8   |   srjs  |
|    v4   |   4   |   qwks  |
|    c0   |   0   |   khfd  |
|    ba   |   2   |   gdbu  |
|    mi   |   1   |   pdlo  |
|    lp   |   7   |   ztpq  |
+---------+-------+---------+

Here's another, df2, that I have

+----------+-------+---------+
|    V1    |   V2  |   V3    |
+----------+-------+---------+
|    Null  |   6   |   ixfg  |
|    Null  |   2   |   jsfd  |
|    Null  |   2   |   hfga  |
|    Null  |   7   |   qwks  |
|    Null  |   1   |   khfd  |
|    Null  |   9   |   gdbu  |
+----------+-------+---------+

What I would like to have is another dataframe that

  1. Ignores values in V2 and takes values in C2 whereever V3 and C3 match, and
  2. Replaces V1 with values in C1 wherever V3 and C3 match.

The result should look like the following:

+----------+-------+---------+
|    M1    |   M2  |   M3    |
+----------+-------+---------+
|    xr    |   1   |   ixfg  |
|    we    |   5   |   jsfd  |
|    r5    |   7   |   hfga  |
|    v4    |   4   |   qwks  |
|    c0    |   0   |   khfd  |
|    ba    |   2   |   gdbu  |
+----------+-------+---------+

Upvotes: 0

Views: 470

Answers (1)

Emma
Emma

Reputation: 9308

You can join and use coalesce to take a value which has a higher priority.

** coalesce will take any number of columns (the highest priority to least in the order of arguments) and return first non-null value, so if you do want to replace with null when there is a null in the lower priority column, you cannot use this function.

df = (df1.join(df2, on=(df1.C3 == df2.V3))
      .select(F.coalesce(df1.C1, df2.V1).alias('M1'), 
              F.coalesce(df2.V2, df1.C2).alias('M2'), 
              F.col('C3').alias('M3')))

Upvotes: 1

Related Questions