Elias Konstantinou
Elias Konstantinou

Reputation: 19

How to get distinct count value in scala

I want to find the distinct values from this query in scala

select  
    key,     
    count(distinct suppKey)  
from  
    file
group by  
    key ; 

I write this code in scala, but didn't working.

val count= file.map(line=> (line.split('|')(0),line.split('|')(1)).distinct().count())

I make split, because key is in the first row in file, and suppkey in the second.

File:

1|52|3956|337.0
1|77|4069|357.8
1|7|14|35.2
2|3|8895|378.4
2|3|4969|915.2
2|3|8539|438.3
2|78|3025|306.3

Expected output:

1|3
2|2

Upvotes: 0

Views: 1524

Answers (2)

pramesh
pramesh

Reputation: 1954

Done in spark REPL. test.txt is the file with the text you've provided

val d = sc.textFile("test.txt")
d.map(x => (x.split("\\|")(0), x.split("\\|")(1))).distinct.countByKey

scala.collection.Map[String,Long] = Map(2 -> 2, 1 -> 3)

Upvotes: 1

user unknown
user unknown

Reputation: 36229

Instead of a file, for simpler testing, I use a String:

scala> val s="""1|52|3956|337.0
     | 1|77|4069|357.8
     | 1|7|14|35.2
     | 2|3|8895|378.4
     | 2|3|4969|915.2
     | 2|3|8539|438.3
     | 2|78|3025|306.3"""

scala> s.split("\n").map (line => {val sp = line.split ('|'); (sp(0), sp(1))}).distinct.groupBy (_._1).map (e => (e._1, e._2.size))
res198: scala.collection.immutable.Map[String,Int] = Map(2 -> 2, 1 -> 3)

Imho, we need a groupBy to specify what to group over, and to count groupwise.

Upvotes: 1

Related Questions