Priyanka
Priyanka

Reputation: 51

Create dynamic query from the Dataframe present in Spark Scala

I have a dataframe DF as below. Based on the Issue column and Datatype column I wants to create a dynamic query. If Issue column is YES then check for the Datatype, If its StringType add Trim(DiffColumnName) to the query or if Datatype is integer do some other operation like round(COUNT,2) And for the column for which Issue type is NO do nothing and select the Column itself

Query should be like this

Select DEST_COUNTRY_NAME, trim(ORIGIN_COUNTRY_NAME),round(COUNT,2)
+-------------------+-----------+-----+
|     DiffColumnName|   Datatype|Issue|
+-------------------+-----------+-----+
|  DEST_COUNTRY_NAME| StringType|   NO|
|ORIGIN_COUNTRY_NAME| StringType|  YES|
|              COUNT|IntegerType|  YES|
+-------------------+-----------+-----+

I am not sure if I should be using If else condition here or case statement or create a UDF. Also my dataframe (i.e. columns) are dynamic and will be changed every time.

Need some suggestions how to proceed here. Thanks

Upvotes: 0

Views: 728

Answers (1)

linusRian
linusRian

Reputation: 340

This can be accomplished using the following piece of code.

  • Derive the new column by applying the required operations
  • Use collect_list to aggregate the values to an array
  • Format the output using concat_ws and concat
val origDF=Seq(("DEST_COUNTRY_NAME","StringType","NO"),
("ORIGIN_COUNTRY_NAME","StringType","YES"),
("COUNT","IntegerType","YES"),
("TESTCOL","StringType","NO")
).toDF("DiffColumnName","Datatype","Issue")

val finalDF=origDF.withColumn("newCol",when(col("Issue")==="YES" && col("DataType")==="StringType",concat(lit("trim("),col("DiffColumnName"),lit(")")))
when(col("Issue")==="YES" && col("DataType")==="IntegerType",concat(lit("round("),col("DiffColumnName"),lit(",2)")))
when(col("Issue")==="NO",col("DiffColumnName"))
)

finalDF.agg(collect_list("newCol").alias("queryout")).select(concat(lit("select "),concat_ws(",",col("queryout")))).show(false)

I included an additional column to the data for testing and it is giving me the desired output.

+-------------------------------------------------------------------------+
|concat(select , concat_ws(,, queryout))                                  |
+-------------------------------------------------------------------------+
|select DEST_COUNTRY_NAME,trim(ORIGIN_COUNTRY_NAME),round(COUNT,2),TESTCOL|
+-------------------------------------------------------------------------+

Upvotes: 1

Related Questions