Reputation: 141
I am new to Scala. I have a Dataframe with fields
ID:string, Time:timestamp, Items:array(struct(name:string,ranking:long))
I want to convert each row of the Items field to a hashmap, with the name as the key. I am not very sure how to do this.
Upvotes: 5
Views: 7145
Reputation: 6099
Since 2.4.0, one can use map_from_entries
:
import spark.implicits._
import org.apache.spark.sql.functions._
val df = Seq(
(Array(("n1", 4L), ("n2", 5L))),
(Array(("n3", 6L), ("n4", 7L)))
).toDF("Items")
df.select(map_from_entries($"Items")).show
/*
+-----------------------+
|map_from_entries(Items)|
+-----------------------+
| [n1 -> 4, n2 -> 5]|
| [n3 -> 6, n4 -> 7]|
+-----------------------+
*/
Upvotes: 1
Reputation: 37822
This can be done using a UDF:
import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
// Sample data:
val df = Seq(
("id1", "t1", Array(("n1", 4L), ("n2", 5L))),
("id2", "t2", Array(("n3", 6L), ("n4", 7L)))
).toDF("ID", "Time", "Items")
// Create UDF converting array of (String, Long) structs to Map[String, Long]
val arrayToMap = udf[Map[String, Long], Seq[Row]] {
array => array.map { case Row(key: String, value: Long) => (key, value) }.toMap
}
// apply UDF
val result = df.withColumn("Items", arrayToMap($"Items"))
result.show(false)
// +---+----+---------------------+
// |ID |Time|Items |
// +---+----+---------------------+
// |id1|t1 |Map(n1 -> 4, n2 -> 5)|
// |id2|t2 |Map(n3 -> 6, n4 -> 7)|
// +---+----+---------------------+
I can't see a way to do this without a UDF (using Spark's built-in functions only).
Upvotes: 9