How to tackle SAFE_CAST sql function in pyspark

Question

We have below query which is working in big query environment.

SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable 
where  SAFE_CAST(value AS FLOAT64) > 0

I need run that query in spark environment using python.

from pyspark.sql import SparkSession
df = spark.read.parquet(path)
df.createOrReplaceTempView("people")

df2=spark.sql("""SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable 
where  SAFE_CAST(value AS FLOAT64) > 0""")

As we have put same query which is used in big query in pyspark sql then we are getting below error:

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))

blackbishop · Accepted Answer

In BigQuery, SAFE_CAST is used to prevent errors from casting. In Spark SQL, cast function will return null if the conversion is not possible. And there is no such function SAFE in Spark.

Also, you're using FLOAT64 which is specific to Bigquery too, you should be using just float. Try this:

df2 = spark.sql("SELECT id, name, CAST(value AS FLOAT) AS resultvalue FROM patienttable WHERE CAST(value AS FLOAT) > 0")

How to tackle SAFE_CAST sql function in pyspark

Answers (2)

Related Questions