Reputation: 305
I have sql query which I want to convert to spark-scala
SELECT aid,DId,BM,BY
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1;
SU is my Data Frame. I did this by
sqlContext.sql("""
SELECT aid,DId,BM,BY
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1
""")
Instead of that I need this in utilizing my dataframe
Upvotes: 0
Views: 10499
Reputation: 37852
This should be the DataFrame equivalent:
SU.filter($"cd" === 2)
.select("aid","DId","BM","BY","TO")
.distinct()
.groupBy("aid","DId","BM","BY")
.count()
.filter($"count" > 1)
.select("aid","DId","BM","BY")
Upvotes: 2