Reputation: 141
I am joining two dataframes using a left join. Rows in the left table may not have a match so I am trying to set a default using the coalesce function
import pyspark.sql.functions as F
joined = t1\
.join(t2, on="id", how='left')\
.select(t1["*"],
F.coalesce(t2.date, F.to_date('2019-01-01')))
I am getting the following error
pyspark.sql.utils.AnalysisException: 'cannot resolve \'CAST(t1.`2019-01-01` AS DATE)\' due to data type mismatch: cannot cast decimal(38,3) to date;;\n\...
I have confirmed that t2.date is in fact a date type. Other t1 columns are the decimal data type seen in the error so it seems to me that it is trying to cast every column to this date type :S
Any help would be greatly appreciated
Upvotes: 1
Views: 3609
Reputation: 42352
The date string was interpreted as a column name of t1. You should specify it as a literal column instead.
import pyspark.sql.functions as F
joined = t1\
.join(t2, on="id", how='left')\
.select(t1["*"],
F.coalesce(t2.date, F.to_date(F.lit('2019-01-01')))
)
Upvotes: 2