case statement with regex in pyspark sql

Question

I have the pyspark code below. In the code I'm creating a dataframe from another dataframe that has been converted into a temporary view. I'm then using a sql query to create a new field in the final query. The code for the field I'm trying to create originally comes from postgresql and I'm wondering what the correct version of the case statement and regex would be in pyspark sql?

case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end

do I just cast(field2 as string)?

also what would be the correct pyspark sql version of the regex test?

code:

from pyspark.sql.types import *
from pyspark.context import SparkContext
from pyspark.sql import Window
from pyspark.sql import SQLContext
from pyspark.sql.functions import col
from pyspark.sql.functions import first
from pyspark.sql.functions  import date_format
from pyspark.sql.functions import lit,StringType

from pyspark.sql.functions import date_trunc, udf,trim, upper, to_date, substring, length, min, when, format_number, dayofmonth, hour, dayofyear,  month, year, weekofyear, date_format, unix_timestamp
from pyspark import SparkConf
from pyspark.sql.functions import coalesce
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth
from pyspark.sql.functions import UserDefinedFunction
import datetime
from pyspark.sql.functions import year
from pyspark.sql.functions import datediff,coalesce,lag
from pyspark.sql.functions import when, to_date
from pyspark.sql.functions import date_add
from pyspark.sql.functions import UserDefinedFunction

import traceback
import sys
import time
import math
import datetime



table_df.createOrReplaceTempView("table")


query="""select
case when a.field2::varchar ~ '^[0-9]+$' then a.field2::varchar else '0' end as field1

from table a"""


df=spark.sql(query)

mck · Accepted Answer

You can try:

query = """
    select
        case when a.field2 rlike '^[0-9]+$' 
             then a.field2
             else '0' 
             end as field1
    from table a
"""

df = spark.sql(query)

case statement with regex in pyspark sql

Answers (1)

Related Questions