Reputation: 121
I have this udf which returns alert severity based on certain conditions.
def alert(name, X_Request, X_Actual):
if "Impact" in name:
return "Highest"
fs = ['FS_00','FS_01','FS_02','FS_03']
if name in fs:
if X_Actual < -3:
return "High"
elif X_Actual <= -2 and X_Actual >= -3 :
return "Medium"
elif X_Actual > -3 and X_Actual <= -0.5:
return "Low"
return None
alert_type = udf(alert, StringType())
df.withColumn("alert_level", alert_type(df["name"],df["x_request"],df["x_actual"]))
Can this be done without applying UDF, as UDF slows down the performance?
Upvotes: 1
Views: 66
Reputation: 6654
A when().otherwise()
like the following should work.
import pyspark.sql.functions as func
fs = ['FS_00','FS_01','FS_02','FS_03']
data_sdf. \
withColumn('alert_level',
func.when(func.upper(func.col('name')).like('%IMPACT%'), func.lit('Highest')).
when(func.col('name').isin(fs),
func.when(func.col('x_actual') < -3, func.lit('High')).
when(func.col('x_actual').between(-3, -2), func.lit('Medium')).
when(func.col('x_actual').between(-2, -0.5), func.lit('Low'))
)
)
Note - .between()
includes the bounds provided to it (it's a >=
& <=
). But in this case it is safe due to the when()
order of execution (anything =2
will already be considered in "Medium"
). Any other value that does not match the conditions will result in a null
as no otherwise()
condition was provided.
Upvotes: 1