Convert SQL Statement into PySpark

Question

I have a statement in MySQL that I'm trying to convert into PySpark:

my_table_name = default_engagements
UPDATE """ + my_table_name + """ SET Engagement = CASE WHEN LinkedAccountId IN ('123456778910', '1098765432123', '254325678912', '429576512356') THEN '808000000298' END WHERE Engagement IS NULL OR Engagement RLIKE '^[a-zA-Z]'; """

I've found this example in python spark:

from pyspark.sql import functions as F
update_func = (F.when(F.col('update_col') == replace_val, new_value)
                .otherwise(F.col('update_col')))

But I don't know how to adapt that to the above SQL. Can someone help me with the syntax? I want to convert the new info into a DF so I can write the new DF to S3.

ARCrow · Accepted Answer

Is this what you're looking for?

import pyspark.sql.functions as f
default_engagements = default_engagements.withColumn('Engagement', f.when((f.col('LinkedAccountId').isin('123456778910', '1098765432123', '254325678912', '429576512356'))&((f.col('Engagement').isNull())|(f.col('Engagement').rlike('^[a-zA-Z]'))), '808000000298').otherwise(f.col('Engagement'))

Convert SQL Statement into PySpark

Answers (1)

Related Questions