Romal Jaiswal
Romal Jaiswal

Reputation: 79

how to access map values and keys stored in a data frame in scala spark

i have a table which description is as follows:

# col_name              data_type               comment             

id                      string                                      
persona_model           map<string,struct<score:double,tag:string>>                     

# Partition Information      
# col_name              data_type               comment             

process_date            string          

sample row would be something like this(tab separated):

000000E91010441BB122402A45D439E7        {"Tech":{"score":0.21678,"tag":"OTHERS"}}    2018-05-16-01              

Now I want to form another table with only 2 columns id and its respective score in it.
How can i do it in scala spark?

Moreover, whats really bugging me is how can I access only a particular score and how can I store it in an integer variable lets say temp?

Upvotes: 1

Views: 4908

Answers (1)

illak zapata
illak zapata

Reputation: 66

You can do this:

val newDF = oldDF.select(col("id"), col("persona_model")("Tech")("score").as("temp"))

then you can extract temp values easily.

update: if you have more than one Key then the procedure is a little more complex.

first create a class for the struct (necesary for type cast):

case class Score(score: Double, tag: String)

then extract all the keys from the data:

val keys = oldDF.rdd
    .flatMap(r => r.getMap(1).asInstanceOf[Map[String, Score]].toList)
    .collect.map(_._1).distinct.toList

finally you can extract all names like this:

def condition(keys: List[String]): Column = {
     keys match {
        case k::ks => when(col("persona_model")(k)("score").isNotNull, col("persona_model")(k)("score")).otherwise(condition(ks))
        case nil  => lit(null)
     }
 }

val newDF = oldDF.select(col("id"), condition(keys))

Upvotes: 1

Related Questions