Anji
Anji

Reputation: 305

Converting Sql query to spark

I have sql query which I want to convert to spark-scala

SELECT aid,DId,BM,BY 
FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1;

SU is my Data Frame. I did this by

sqlContext.sql("""
  SELECT aid,DId,BM,BY 
  FROM (SELECT DISTINCT aid,DId,BM,BY,TO FROM SU WHERE cd =2) t 
  GROUP BY aid,DId,BM,BY HAVING COUNT(*) >1
""")

Instead of that I need this in utilizing my dataframe

Upvotes: 0

Views: 10499

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37852

This should be the DataFrame equivalent:

SU.filter($"cd" === 2)
  .select("aid","DId","BM","BY","TO")
  .distinct()
  .groupBy("aid","DId","BM","BY")
  .count()
  .filter($"count" > 1)
  .select("aid","DId","BM","BY")

Upvotes: 2

Related Questions