Amol
Amol

Reputation: 386

How to tackle SAFE_CAST sql function in pyspark

We have below query which is working in big query environment.

SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable 
where  SAFE_CAST(value AS FLOAT64) > 0

I need run that query in spark environment using python.

from pyspark.sql import SparkSession
df = spark.read.parquet(path)
df.createOrReplaceTempView("people")

df2=spark.sql("""SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable 
where  SAFE_CAST(value AS FLOAT64) > 0""")

As we have put same query which is used in big query in pyspark sql then we are getting below error:

ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))

Upvotes: 1

Views: 2919

Answers (2)

Ghost
Ghost

Reputation: 520

This should work

new_df = spark.sql("select id, name, cast(value as float) as resultvalue from patienttable WHERE value > 0")

If you want a specific decimal point in your data use decimal(a, b) instead of float.

Upvotes: 0

blackbishop
blackbishop

Reputation: 32660

In BigQuery, SAFE_CAST is used to prevent errors from casting. In Spark SQL, cast function will return null if the conversion is not possible. And there is no such function SAFE in Spark.

Also, you're using FLOAT64 which is specific to Bigquery too, you should be using just float. Try this:

df2 = spark.sql("SELECT id, name, CAST(value AS FLOAT) AS resultvalue FROM patienttable WHERE CAST(value AS FLOAT) > 0")

Upvotes: 1

Related Questions