peeps
peeps

Reputation: 53

How to solve data mismatch error in pyspark due to trim function?

df1 = (df.withColumn("columnName_{}".format(columnName), psf.lit(columnName))
    .withColumn("{}_not_null".format(columnName), psf.when((psf.col(columnName).isNotNull()& psf.trim(psf.col(columnName))!= psf.lit('') ),1))

when I am running this code it is giving me an error:

cannot resolve '((`Address` IS NOT NULL) AND trim(`Address`))' due to data type mismatch: differing types in '((`Address` IS NOT NULL) AND trim(`Address`))' (boolean and string)

can anyone please help me solve this error?

Upvotes: 0

Views: 372

Answers (1)

mck
mck

Reputation: 42352

You need to surround the second condition in parenthesis because & has a higher evaluation priority than !=:

df1 = df.withColumn(
    "columnName_{}".format(columnName),
    psf.lit(columnName)
).withColumn(
    "{}_not_null".format(columnName),
    psf.when(
        psf.col(columnName).isNotNull() &
        (psf.trim(psf.col(columnName)) != psf.lit(''))
    , 1)
)

Upvotes: 1

Related Questions