syv
syv

Reputation: 3608

Replacing column value with conditional other column column Dataframe

Now Im creating a new column with another column value

targetDf = resultDataFrame.withColumn("weekday",psf.when(resultDataFrame["day"] == 0 , 'MON')
    .when(resultDataFrame["day"] == 1 , 'TUE')
    .when(resultDataFrame["day"] == 2 , 'WED')
    .when(resultDataFrame["day"] == 3 , 'THU')
    .when(resultDataFrame["day"] == 4 , 'FRI')
    .when(resultDataFrame["day"] == 5 , 'SAT')
    .otherwise('SUN'))      

I would like to simplify this something like

dayList = ['SUN' , 'MON' , 'TUE' , 'WED' , 'THR' , 'FRI' , 'SAT']
resultDataFrame.withColumn("weekday" , dayList[resultDataFrame.day])

But Im getting error it must be an integer can not a column. Any other option?

Upvotes: 1

Views: 562

Answers (1)

akuiper
akuiper

Reputation: 215117

Example data:

df = spark.createDataFrame([[0], [3], [5]], ['day'])
df.show()
+---+
|day|
+---+
|  0|
|  3|
|  5|
+---+

Use reduce to create the chained Column expression:

import pyspark.sql.functions as F
from functools import reduce
df.withColumn('weekday', reduce(lambda col, i: col.when(df.day == i, dayList[i]), range(7), F)).show()
+---+-------+
|day|weekday|
+---+-------+
|  0|    SUN|
|  3|    WED|
|  5|    FRI|
+---+-------+

Where reduce generates a column expression as:

reduce(lambda col, i: col.when(df.day == i, dayList[i]), range(7), F)
# Column<b'CASE WHEN (day = 0) THEN SUN WHEN (day = 1) THEN MON WHEN (day = 2) THEN TUE WHEN (day = 3) THEN WED WHEN (day = 4) THEN THR WHEN (day = 5) THEN FRI WHEN (day = 6) THEN SAT END'>

Or make a udf:

df.withColumn('weekday', F.udf(lambda day: dayList[day])(df.day)).show()
+---+-------+
|day|weekday|
+---+-------+
|  0|    SUN|
|  3|    WED|
|  5|    FRI|
+---+-------+

Upvotes: 2

Related Questions