Reputation: 79
i have a table which description is as follows:
# col_name data_type comment
id string
persona_model map<string,struct<score:double,tag:string>>
# Partition Information
# col_name data_type comment
process_date string
sample row would be something like this(tab separated):
000000E91010441BB122402A45D439E7 {"Tech":{"score":0.21678,"tag":"OTHERS"}} 2018-05-16-01
Now I want to form another table with only 2 columns id
and its respective score
in it.
How can i do it in scala spark?
Moreover, whats really bugging me is how can I access only a particular score
and how can I store it in an integer variable lets say temp
?
Upvotes: 1
Views: 4908
Reputation: 66
You can do this:
val newDF = oldDF.select(col("id"), col("persona_model")("Tech")("score").as("temp"))
then you can extract temp values easily.
update: if you have more than one Key then the procedure is a little more complex.
first create a class for the struct (necesary for type cast):
case class Score(score: Double, tag: String)
then extract all the keys from the data:
val keys = oldDF.rdd
.flatMap(r => r.getMap(1).asInstanceOf[Map[String, Score]].toList)
.collect.map(_._1).distinct.toList
finally you can extract all names like this:
def condition(keys: List[String]): Column = {
keys match {
case k::ks => when(col("persona_model")(k)("score").isNotNull, col("persona_model")(k)("score")).otherwise(condition(ks))
case nil => lit(null)
}
}
val newDF = oldDF.select(col("id"), condition(keys))
Upvotes: 1