Marc On
Marc On

Reputation: 141

Pyspark SQL coalesce data type mismatch with date cast

I am joining two dataframes using a left join. Rows in the left table may not have a match so I am trying to set a default using the coalesce function

import pyspark.sql.functions as F

joined = t1\
        .join(t2, on="id", how='left')\
        .select(t1["*"],
                F.coalesce(t2.date, F.to_date('2019-01-01')))

I am getting the following error

pyspark.sql.utils.AnalysisException: 'cannot resolve \'CAST(t1.`2019-01-01` AS DATE)\' due to data type mismatch: cannot cast decimal(38,3) to date;;\n\...

I have confirmed that t2.date is in fact a date type. Other t1 columns are the decimal data type seen in the error so it seems to me that it is trying to cast every column to this date type :S

Any help would be greatly appreciated

Upvotes: 1

Views: 3609

Answers (1)

mck
mck

Reputation: 42352

The date string was interpreted as a column name of t1. You should specify it as a literal column instead.

import pyspark.sql.functions as F

joined = t1\
        .join(t2, on="id", how='left')\
        .select(t1["*"],
                F.coalesce(t2.date, F.to_date(F.lit('2019-01-01')))
               )

Upvotes: 2

Related Questions