Reputation: 281
I am new to Spark programming and have a scenario to assign a value when a set of values appear in my input. Below is a tradition SQL code I would use to accomplish my task. Need to do the same in Spark.
Sql Code:
SELECT CASE WHEN c.Number IN ( '1121231', '31242323' ) THEN 1
ELSE 2 END AS Test
FROM Input c
I am aware of using when
in spark with just one condition.
Input.select(when(Input.Number==1121231,1).otherwise(2).alias("Test")).show()
Upvotes: 2
Views: 16946
Reputation: 3911
I'm assuming you're working with Spark DataFrames, not RDDs. One thing to note is that you can run SQL queries directly on a DataFrame:
# register the DataFrame so we can refer to it in queries
sqlContext.registerDataFrameAsTable(df, "df")
# put your SQL query in a string
query = """SELECT CASE WHEN
df.number IN ('1121231', '31242323') THEN 1 ELSE 2 END AS test
FROM df"""
result = sqlContext.sql(query)
result.show()
You can also use select
by creating a user-defined function that mimics your query's case statement:
from pyspark.sql.types import *
from pyspark.sql.functions import udf
# need to pass inner function through udf() so it can operate on Columns
# also need to specify return type
column_in_list = udf(
lambda column: 1 if column in ['1121231', '31242323'] else 2,
IntegerType()
)
# call function on column, name resulting column "transformed"
result = df.select(column_in_list(df.number).alias("transformed"))
result.show()
Upvotes: 6