Reputation: 3
I have this dataFrames :
+----+-------+-----------+...+------+----------------+---------+
|mot1| brand| device|...|action|Column_to_modify|New_value|
+----+-------+-----------+...------+----------------+---------+
| 09| Tesla| PC|...|modify| brand| Jeep|
| 10| Tesla|SmallTablet|...|modify| brand| Jeep|
| 09| Tesla| PC|...|modify| brand| Jeep|
| 10| Tesla|SmallTablet|...|modify| mot1| 20|
| 10| Tesla|SmallTablet|...|modify| mot1| 20|
+----+-------+-----------+...+------+----------------+---------+
So how can I modify columns using the "Column_to_modify" and "New_value" columns ?
What I want is:
+----+-------+-----------+...+------+----------------+---------+
|mot1| brand| device|...|action|Column_to_modify|New_value|
+----+-------+-----------+...------+----------------+---------+
| 09| Jeep| PC|...|modify| brand| Jeep|
| 10| Jeep|SmallTablet|...|modify| brand| Jeep|
| 09| Jeep| PC|...|modify| brand| Jeep|
| 20| Tesla|SmallTablet|...|modify| mot1| 20|
| 20| Tesla|SmallTablet|...|modify| mot1| 20|
+----+-------+-----------+...+------+----------------+---------+
Any ideas?
Upvotes: 0
Views: 3087
Reputation: 7207
With UDF assigned to each column:
val df = List(
("09", "Tesla", "PC", "modify", "brand", "Jeep"),
("10", "Tesla", "SmallTablet", "modify", "brand", "Jeep"),
("09", "Tesla", "PC", "modify", "brand", "Jeep"),
("10", "Tesla", "SmallTablet", "modify", "mot1", "20"),
("10", "Tesla", "SmallTablet", "modify", "mot1", "20")
).toDF("mot1", "brand", "device", "action", "Column_to_modify", "New_value")
val modifyColumn = (colName: String, colValue: String, modifyColumnName: String, modifyColumnValue: String) =>
if (colName.equals(modifyColumnName)) modifyColumnValue else colValue
val modifyColumnUDF = udf(modifyColumn)
val result = df
.withColumn("mot1", modifyColumnUDF(lit("mot1"), $"mot1", $"Column_to_modify", $"New_value"))
.withColumn("brand", modifyColumnUDF(lit("brand"), $"brand", $"Column_to_modify", $"New_value"))
result.show(false)
Output:
+----+-----+-----------+------+----------------+---------+
|mot1|brand|device |action|Column_to_modify|New_value|
+----+-----+-----------+------+----------------+---------+
|09 |Jeep |PC |modify|brand |Jeep |
|10 |Jeep |SmallTablet|modify|brand |Jeep |
|09 |Jeep |PC |modify|brand |Jeep |
|20 |Tesla|SmallTablet|modify|mot1 |20 |
|20 |Tesla|SmallTablet|modify|mot1 |20 |
+----+-----+-----------+------+----------------+---------+
Upvotes: 1
Reputation: 6739
one quick way of achieving this is using map
operation and convert into desired data format something like below:
import org.json.JSONObject
// creating input dataframe by reading input file
val inputDF = sparkSession.read.option("header", "true").csv("my_input_file.csv")
inputDF.printSchema()
inputDF.show(false)
val resultRDD = inputDF.toJSON.rdd.map(row => {
val json = new JSONObject(row)
val columnToModify = json.getString("Column_to_modify")
val newValue = json.get("New_value")
if (json.has(columnToModify)) {
json.put(columnToModify, newValue)
}
json.toString
})
// converting the result RDD into dataframe
val finalOutputDF = sparkSession.read.json(resultRDD)
finalOutputDF.printSchema()
finalOutputDF.show(false)
and the output will be as follows:
root
|-- mot1: string (nullable = true)
|-- brand: string (nullable = true)
|-- device: string (nullable = true)
|-- action: string (nullable = true)
|-- Column_to_modify: string (nullable = true)
|-- New_value: string (nullable = true)
+----+-----+-----------+------+----------------+---------+
|mot1|brand|device |action|Column_to_modify|New_value|
+----+-----+-----------+------+----------------+---------+
|09 |Tesla|PC |modify|brand |Jeep |
|10 |Tesla|SmallTablet|modify|brand |Jeep |
|09 |Tesla|PC |modify|brand |Jeep |
|10 |Tesla|SmallTablet|modify|mot1 |20 |
|10 |Tesla|SmallTablet|modify|mot1 |20 |
+----+-----+-----------+------+----------------+---------+
root
|-- Column_to_modify: string (nullable = true)
|-- New_value: string (nullable = true)
|-- action: string (nullable = true)
|-- brand: string (nullable = true)
|-- device: string (nullable = true)
|-- mot1: string (nullable = true)
+----------------+---------+------+-----+-----------+----+
|Column_to_modify|New_value|action|brand|device |mot1|
+----------------+---------+------+-----+-----------+----+
|brand |Jeep |modify|Jeep |PC |09 |
|brand |Jeep |modify|Jeep |SmallTablet|10 |
|brand |Jeep |modify|Jeep |PC |09 |
|mot1 |20 |modify|Tesla|SmallTablet|20 |
|mot1 |20 |modify|Tesla|SmallTablet|20 |
+----------------+---------+------+-----+-----------+----+
Observe in the second dataframe, the keys are in sorted order and with the desired output.
Upvotes: 0