Pyspark Replace DF Value When Value Is In List

Question

I'm trying to write a pyspark script to scrub information from a pyspark df. The df I have looks like:

  hashed_customer     firstname    lastname    email   order_id    status          timestamp
      eater 1_uuid  1_firstname  1_lastname  1_email    12345    OPTED_IN     2020-05-14 20:45:15
      eater 2_uuid  2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
      eater 3_uuid  3_firstname  3_lastname  3_email    34567    OPTED_IN     2020-05-14 19:31:55
      eater 4_uuid  4_firstname  4_lastname  4_email    45678    OPTED_IN     2020-05-14 17:49:27

I have another pyspark df with the customer I need to remove from the customer_temp_tb table that looks like this:

hashed_customer    eaterstatus
   eater 1_uuid      OPTED_OUT
   eater 3_uuid      OPTED_OUT

I'm trying to find a way to remove the firstname, lastname, and email from the first df if the user is in the second df. So far, I've created a list of the hashed_customers from the second df using:

cust_opt_out_id = [row.hashed_eater_uuid for row in df_out.collect()]

Now, I'm trying to find a way to remove firstname, lastname, and email from the first df if the hashed_customer ID is in the second df so that the end result would look like:

hashed_customer     firstname    lastname    email   order_id    status          timestamp
   eater 1_uuid           NaN         NaN      NaN    12345    OPTED_IN     2020-05-14 20:45:15
   eater 2_uuid   2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
   eater 3_uuid           NaN         NaN      NaN    34567    OPTED_IN     2020-05-14 19:31:55
   eater 4_uuid   4_firstname  4_lastname  4_email    45678    OPTED_IN     2020-05-14 17:49:27

How can I create a function to do this? I know in pandas it would be a simple:

df_cust_out.loc[df_in['hashed_customer'].isin(cust_opt_out_id),['firstname','lastname', 'email']]=np.nan

But this doesn't work in pyspark.

Pyspark Replace DF Value When Value Is In List

Answers (1)

Related Questions