Reputation: 783
I basically want to get the name, number of stars, and the number of reviews, of the restaurants with 5 stars and more than 1000 reviews.
def fiveStarBusinessesSQL():DataFrame = {
spark.sql("SELECT name, stars, review_count FROM yelpBusinessesView WHERE stars == 5 && review_count >= 1000")
}
It makes no sense to me why I get the error. It is a basic SQL call, as basic as it can get IMO.
Here's the error I get:
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'FROM' expecting <EOF>(line 1, pos 33)
== SQL ==
SELECT name, stars, review_count FROM yelpBusinessesView WHERE stars == 5 && review_count >= 1000
---------------------------------^^^
I'm working on the Yelp Dataset. Here's an example of what's in yelpBusinessesView
{"business_id":"1SWheh84yJXfytovILXOAQ","name":"Arizona Biltmore Golf Club","address":"2818 E Camino Acequia Drive","city":"Phoenix","state":"AZ","postal_code":"85016","latitude":33.5221425,"longitude":-112.0184807,"stars":3.0,"review_count":5,"is_open":0,"attributes":{"GoodForKids":"False"},"categories":"Golf, Active Life","hours":null}
Upvotes: 0
Views: 136