Reputation: 323
I am trying to filter out the crime data that happened on Sunday in the SF downtown area, however, after I successfully label them with 1(in downtown) and 0(not in downtown), I try to count the number of crimes each "Sunday" at "SF downtown". But it said table or view not found for some reason. I don't understand that as I just created(in the same box) the table I am trying to access. So what I tried to do is to use the "createOrReplaceTempView()" to make it a table. After I used that, I got a new error instead of the old one. Can someone please tell me if this is the right approach to solve the "table not found" error? And also, I am confused about how to count the row that is labeled as 1. This is what I have right now and I don't understand why mine is not working. Here are two pictures of how the table looks like:
And here is the error I am getting:
from pyspark.sql.functions import when
import pyspark.sql.functions as F
#First, pick out the crime cases that happens on Sunday
q3_sunday = spark.sql("SELECT * FROM sf_crime WHERE DayOfWeek='Sunday'")
#Then, we add a new column for us to filter out(identify) if the crime is in DT
q3_final = q3_sunday.withColumn("isDT",F.when(((q3_sunday.X.between(-122.4313,-122.4213))&
(q3_sunday.Y.between(37.7540,37.7740))),1).otherwise(0))
#Last but not least, I count the crimes that happens each Sunday at SF downtown with the newly added
column as well as the True(1) and False(0) column
q3_final.createOrReplaceTempView("q3final_tbl")
sunday_dt = spark.sql("SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY DayofWeek ORDER BY Count DESC")
Upvotes: 0
Views: 361
Reputation: 1054
You need to correct GROUP BY
clause in the SQL. It should be -
SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY isDT ORDER BY Count DESC
Upvotes: 1