Reputation: 141
Currently, I have a dataframe which has two fields, name as
id1, id2
Seq[String] Map[String,(String,Long,Long)]
I would like to create another column with name rate, which is the percentage of the number of ids in id1 appear as the key of the map
It seems I was not able to fit a for loop inside the udf, wondering how should I do this?
Upvotes: 1
Views: 681
Reputation: 214927
Use Seq.count
and Map.isDefinedAt
to check the number of keys existing in the Map and then simply wrap it with udf
:
val df = Seq((Seq("a", "b", "c"), Map("a" -> ("x", 1L, 2L), "x" -> ("y", 2L,2L)))).toDF("id1", "id2")
type CustMap = Map[String, (String, Long, Long)]
def percent_in = udf(
(id1: Seq[String], id2: CustMap) => id1.count(id2.isDefinedAt)/id1.length.toDouble
)
df.withColumn("rate", percent_in($"id1", $"id2")).show
+---------+--------------------+------------------+
| id1| id2| rate|
+---------+--------------------+------------------+
|[a, b, c]|Map(a -> [x,1,2],...|0.3333333333333333|
+---------+--------------------+------------------+
Upvotes: 1