Reputation: 131
I have imported a csv file into a dataframe in Azure Databricks using scala.
--------------
A B C D E
--------------
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
--------------
Now I want to perform hash on some selective columns and add the result as a new column to that dataframe.
--------------------------------
A B B2 C D D2 E
--------------------------------
a1 b1 hash(b1) c1 d1 hash(d1) e1
a2 b2 hash(b2) c2 d2 hash(d2) e2
--------------------------------
This is the code I have:
val data_df = spark.read.format("csv").option("header", "true").option("sep", ",").load(input_file)
...
...
for (col <- columns) {
if (columnMapping.keys.contains((col))){
val newColName = col + "_token"
// Now here I want to add a new column to data_df and the content would be hash of the current value
}
}
// And here I would like to upload selective columns (B, B2, D, D2) to a SQL database
Any help will be highly appreciated. Thank you!
Upvotes: 0
Views: 1093
Reputation: 354
Try this -
val colsToApplyHash = Array("B","D")
val hashFunction:String => String = <ACTUAL HASH LOGIC>
val hash = udf(hashFunction)
val finalDf = colsToApplyHash.foldLeft(data_df){
case(acc,colName) => acc.withColumn(colName+"2",hash(col(colName)))
}
Upvotes: 1