Reputation: 51
I have a dataframe DF as below. Based on the Issue column and Datatype column I wants to create a dynamic query. If Issue column is YES then check for the Datatype, If its StringType add Trim(DiffColumnName) to the query or if Datatype is integer do some other operation like round(COUNT,2) And for the column for which Issue type is NO do nothing and select the Column itself
Query should be like this
Select DEST_COUNTRY_NAME, trim(ORIGIN_COUNTRY_NAME),round(COUNT,2)
+-------------------+-----------+-----+
| DiffColumnName| Datatype|Issue|
+-------------------+-----------+-----+
| DEST_COUNTRY_NAME| StringType| NO|
|ORIGIN_COUNTRY_NAME| StringType| YES|
| COUNT|IntegerType| YES|
+-------------------+-----------+-----+
I am not sure if I should be using If else condition here or case statement or create a UDF. Also my dataframe (i.e. columns) are dynamic and will be changed every time.
Need some suggestions how to proceed here. Thanks
Upvotes: 0
Views: 728
Reputation: 340
This can be accomplished using the following piece of code.
val origDF=Seq(("DEST_COUNTRY_NAME","StringType","NO"),
("ORIGIN_COUNTRY_NAME","StringType","YES"),
("COUNT","IntegerType","YES"),
("TESTCOL","StringType","NO")
).toDF("DiffColumnName","Datatype","Issue")
val finalDF=origDF.withColumn("newCol",when(col("Issue")==="YES" && col("DataType")==="StringType",concat(lit("trim("),col("DiffColumnName"),lit(")")))
when(col("Issue")==="YES" && col("DataType")==="IntegerType",concat(lit("round("),col("DiffColumnName"),lit(",2)")))
when(col("Issue")==="NO",col("DiffColumnName"))
)
finalDF.agg(collect_list("newCol").alias("queryout")).select(concat(lit("select "),concat_ws(",",col("queryout")))).show(false)
I included an additional column to the data for testing and it is giving me the desired output.
+-------------------------------------------------------------------------+
|concat(select , concat_ws(,, queryout)) |
+-------------------------------------------------------------------------+
|select DEST_COUNTRY_NAME,trim(ORIGIN_COUNTRY_NAME),round(COUNT,2),TESTCOL|
+-------------------------------------------------------------------------+
Upvotes: 1