Reputation: 386
We have below query which is working in big query environment.
SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable
where SAFE_CAST(value AS FLOAT64) > 0
I need run that query in spark environment using python.
from pyspark.sql import SparkSession
df = spark.read.parquet(path)
df.createOrReplaceTempView("people")
df2=spark.sql("""SELECT id,name, SAFE_CAST(value AS FLOAT64) as resultvalue from patienttable
where SAFE_CAST(value AS FLOAT64) > 0""")
As we have put same query which is used in big query in pyspark sql then we are getting below error:
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
Upvotes: 1
Views: 2919
Reputation: 520
This should work
new_df = spark.sql("select id, name, cast(value as float) as resultvalue from patienttable WHERE value > 0")
If you want a specific decimal point in your data use decimal(a, b)
instead of float
.
Upvotes: 0
Reputation: 32660
In BigQuery, SAFE_CAST
is used to prevent errors from casting.
In Spark SQL, cast
function will return null
if the conversion is not possible. And there is no such function SAFE
in Spark.
Also, you're using FLOAT64
which is specific to Bigquery too, you should be using just float
. Try this:
df2 = spark.sql("SELECT id, name, CAST(value AS FLOAT) AS resultvalue FROM patienttable WHERE CAST(value AS FLOAT) > 0")
Upvotes: 1