Reputation: 19
I want to find the distinct values from this query in scala
select
key,
count(distinct suppKey)
from
file
group by
key ;
I write this code in scala, but didn't working.
val count= file.map(line=> (line.split('|')(0),line.split('|')(1)).distinct().count())
I make split, because key is in the first row in file, and suppkey in the second.
File:
1|52|3956|337.0
1|77|4069|357.8
1|7|14|35.2
2|3|8895|378.4
2|3|4969|915.2
2|3|8539|438.3
2|78|3025|306.3
Expected output:
1|3
2|2
Upvotes: 0
Views: 1524
Reputation: 1954
Done in spark REPL. test.txt is the file with the text you've provided
val d = sc.textFile("test.txt")
d.map(x => (x.split("\\|")(0), x.split("\\|")(1))).distinct.countByKey
scala.collection.Map[String,Long] = Map(2 -> 2, 1 -> 3)
Upvotes: 1
Reputation: 36229
Instead of a file, for simpler testing, I use a String:
scala> val s="""1|52|3956|337.0
| 1|77|4069|357.8
| 1|7|14|35.2
| 2|3|8895|378.4
| 2|3|4969|915.2
| 2|3|8539|438.3
| 2|78|3025|306.3"""
scala> s.split("\n").map (line => {val sp = line.split ('|'); (sp(0), sp(1))}).distinct.groupBy (_._1).map (e => (e._1, e._2.size))
res198: scala.collection.immutable.Map[String,Int] = Map(2 -> 2, 1 -> 3)
Imho, we need a groupBy to specify what to group over, and to count groupwise.
Upvotes: 1