DJ-coding
DJ-coding

Reputation: 323

Having trouble accessing the table I just created in Spark

I am trying to filter out the crime data that happened on Sunday in the SF downtown area, however, after I successfully label them with 1(in downtown) and 0(not in downtown), I try to count the number of crimes each "Sunday" at "SF downtown". But it said table or view not found for some reason. I don't understand that as I just created(in the same box) the table I am trying to access. enter image description here So what I tried to do is to use the "createOrReplaceTempView()" to make it a table. After I used that, I got a new error instead of the old one. Can someone please tell me if this is the right approach to solve the "table not found" error? And also, I am confused about how to count the row that is labeled as 1. This is what I have right now and I don't understand why mine is not working. Here are two pictures of how the table looks like: How the table looks like The column that labels if the crime is in Downtown

And here is the error I am getting:Error picture

from pyspark.sql.functions import when
import pyspark.sql.functions as F

#First, pick out the crime cases that happens on Sunday
q3_sunday = spark.sql("SELECT * FROM sf_crime WHERE DayOfWeek='Sunday'")
#Then, we add a new column for us to filter out(identify) if the crime is in DT
q3_final = q3_sunday.withColumn("isDT",F.when(((q3_sunday.X.between(-122.4313,-122.4213))& 
(q3_sunday.Y.between(37.7540,37.7740))),1).otherwise(0))
#Last but not least, I count the crimes that happens each Sunday at SF downtown with the newly added 
column as well as the True(1) and False(0) column
q3_final.createOrReplaceTempView("q3final_tbl")
sunday_dt = spark.sql("SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY DayofWeek ORDER BY Count DESC")

Upvotes: 0

Views: 361

Answers (1)

Shantanu Kher
Shantanu Kher

Reputation: 1054

You need to correct GROUP BY clause in the SQL. It should be -

SELECT isDT, COUNT(*) AS Count FROM q3final_tbl WHERE isDT='1' GROUP BY isDT ORDER BY Count DESC

Upvotes: 1

Related Questions