Reputation: 359
I am trying to rank the column 'Product' based on the revenue from below data frame- salesDF
salesDF=
+-------------+-------+---------+----------+-------+
|transactionID|Product| category|produtType|Revenue|
+-------------+-------+---------+----------+-------+
| 105| Lenova| laptop| high| 40000|
| 111| Lenova| tablet| medium| 20000|
| 103| dell| laptop| medum| 25000|
| 107| iphone|cellPhone| small| 70000|
| 113| lenovo|cellPhone| medium| 8000|
| 108| mi|cellPhone| medum| 10000|
Now below iam using spark sql to rank each Product based on the Revenue
rankTheRevenue= salesDF.createTempView("Ranking_DF")
rankProduct= session.sql("select Product, Revenue, rank() over(partion by Product order by Revenue) as Rank_revenue from Ranking_DF")
rankProduct.show()
pyspark.sql.utils.ParseException:
mismatched input '(' expecting {<EOF>, ',', 'CLUSTER', 'DISTRIBUTE', 'EXCEPT', 'FROM', 'GROUP', 'HAVING', 'INTERSECT', 'LATERAL', 'LIMIT', 'ORDER', 'MINUS', 'SORT', 'UNION', 'WHERE', 'WINDOW', '-'}(line 1, pos 36)
Appreciate if anyone can help me to resolve this kind of issue
Thanks
Upvotes: 1
Views: 6123
Reputation: 31460
You have a typo at partition by
clause as partion by
.
Try with:
rankTheRevenue= salesDF.createTempView("Ranking_DF")
rankProduct= session.sql("select Product, Revenue, rank() over(partition by Product order by Revenue) as Rank_revenue from Ranking_DF")
rankProduct.show()
Upvotes: 1