Reputation: 117
I have a dataframe that contains the Property table and main table from Hive. I want to remove columns and then I want to apply masking logic (SHA2).
Reading Property config from postgre DB as a Dataframe in Spark/scala job.
val propertydf = loading the property dataframe from postgre db
Main Hive table
and the output should be
Anyone, please help me write a code in Spark/Scala. I am unable to convert List[String] and pass it to function from dataframe config.
Upvotes: 0
Views: 371
Reputation: 42422
You can manipulate the column names and select them as appropriate:
val masking = propertydf.head(1)(0).getAs[String]("maskingcolumns").split(",")
val exclude = propertydf.head(1)(0).getAs[String]("columnstoexclude").split(",")
val result = df.select(
masking.map(c => sha2(col(c).cast("string"), 256).as(c)) ++
df.columns.filterNot(c => masking.contains(c) || exclude.contains(c)).map(col)
:_*
)
result.show(false)
+----------------------------------------------------------------+----------------------------------------------------------------+---+---+
|a |b |c |d |
+----------------------------------------------------------------+----------------------------------------------------------------+---+---+
|ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad|6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b|11 |cbc|
+----------------------------------------------------------------+----------------------------------------------------------------+---+---+
Upvotes: 1