Reputation: 4595
I have the pyspark code below. In the code I'm creating a dataframe from another dataframe that has been converted into a temporary view. I'm then using a sql query to create a new field in the final query. The code for the field I'm trying to create originally comes from postgresql and I'm wondering what the correct version of the case statement and regex would be in pyspark sql?
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end
do I just cast(field2 as string)?
also what would be the correct pyspark sql version of the regex test?
code:
from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions import date_format
from pyspark.sql.functions import lit,StringType
from pyspark.sql.functions import date_trunc, udf,trim, upper, to_date, substring, length, min, when, format_number, dayofmonth, hour, dayofyear, month, year, weekofyear, date_format, unix_timestamp
from pyspark import SparkConf
from pyspark.sql.functions import coalesce
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.sql.functions import UserDefinedFunction
import datetime
from pyspark.sql.functions import year
from pyspark.sql.functions import datediff,coalesce,lag
from pyspark.sql.functions import when, to_date
from pyspark.sql.functions import date_add
from pyspark.sql.functions import UserDefinedFunction
import traceback
import sys
import time
import math
import datetime
table_df.createOrReplaceTempView("table")
query="""select
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end as field1
from table a"""
df=spark.sql(query)
Upvotes: 0
Views: 1344
Reputation: 42422
You can try:
query = """
select
case when a.field2 rlike '^[0-9]+$'
then a.field2
else '0'
end as field1
from table a
"""
df = spark.sql(query)
Upvotes: 1