Reputation: 151
To remove duplicate rows, I attempt this sql
val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters GROUP BY title")
testsql.show()
but this sql make this error message. if you know this problem, please answer this questin.
thanks you
Parsing command: SELECT * FROM characters GROUP BY title
Exception in thread "main" org.spache.spark.sql.AnalysisException:
expression 'characters.`url`' is neither present in the group by, nor is it an aggregate function
Add to Add to group by or wrap in first() if you don't care which value you get.;;
and then i attempt like this but i don't know this is right solution....
please answer this question. thanks you!
val characters = MongoSpark.load[sparkSQL.Character](sparkSession)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.select("SELECT * FROM characters")
testgrsql = testsql.groupBy("title")
testgrsql.show()
Upvotes: 1
Views: 69
Reputation: 35404
Error message explains everything,
Parsing command: SELECT * FROM characters GROUP BY title
Exception in thread "main" org.spache.spark.sql.AnalysisException: expression 'characters.url' is neither present in the group by, nor is it an aggregate function
Add to Add to group by or wrap in first() if you don't care which value you get.;;
So the usage can be, If you want first url value for each title then first(url)
characters.createOrReplaceTempView("characters")
val testsql = sparkSession.sql("SELECT title, first(url) FROM characters GROUP BY title")
Upvotes: 1