Reputation: 109
Trying to do a simple count in Pyspark programmatically but coming up with errors. .count()
works at the end of the statement if I drop AS (count(city))
but I need the count to appear inside not on the outside.
result = spark.sql("SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'")
One of many errors
Py4JJavaError: An error occurred while calling o24.sql.
: org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '(' expecting ')'(line 1, pos 21)
== SQL ==
SELECT city AS (count(city)) AND business_id FROM business WHERE city = 'Reading'
---------------------^^^
Upvotes: 0
Views: 134
Reputation: 109
Just my solution to the problem I'm trying to solve. The solution above is where I would like to be at.
result = spark.sql("SELECT count(*) FROM business WHERE city='Reading'")
Upvotes: 0
Reputation: 42422
Your syntax is incorrect. Maybe you want to do this instead:
result = spark.sql("""
SELECT
count(city) over(partition by city),
business_id
FROM business
WHERE city = 'Reading'
""")
You need to provide a window if you use count
without group by
. In this case, you probably want a count for each city.
Upvotes: 2