Reputation: 2045
I'm dealing with the yelp dataset on sqlite3
, and I have a schema as follows:
CREATE TABLE businesses
(business_id text primary key,
name text,
full_address text,
city text,
stars decimal(1,1),
ratings integer);
I run the following query:
select name, stars from businesses where ratings >= 400;
And I get the following:
Pho Kim Long |3.5
Pho So 1 |3.5
Ichiza |4
...
Guy Fieri |3.5
Giada |4
If I run:
select name, min(stars) from businesses where ratings >= 400;
I get The Quad Las Vegas Resort & Casino|2
But if I run:
select name
from (select name, min(stars)
from businesses
where ratings >= 400);
I get Giada
.
I know that there's a way to do this with LIMIT X
, but why does this occur in the first place? Is there something about the SQLite parser that I'm neglecting?
Upvotes: 2
Views: 711
Reputation: 58544
The other answers here are correct (and thus upvoted), but I'm not sure they quite get you from where you are to where you want to be.
Typically, aggregate functions like MIN/MAX/COUNT are used with GROUP BY clauses. For example:
-- Count the number of businesses per city
SELECT city, COUNT(1) FROM biz GROUP BY city;
-- Find the minimum rating of any business per city
SELECT city, MIN(stars) FROM biz GROUP BY city;
What happens when you omit the GROUP BY clause but keep the non-aggregated field ("city") in the query? You're only going to get one MIN()imum, but which "city" will it be? What if two or more cities are tied for lowest rated businesses?
-- Er, which city will it be?
SELECT city, MIN(stars) FROM biz;
Some RDBMS prohibit this construction entirely, and the SQL specs disallow it.* However, SQLite and some others will return a single row in this case, with the minimum rating across the entire table and an arbitrarily chosen "city". You cannot rely on that row to represent all the aggregates. Indeed, depending on the RDBMS version it may not even represent any properly aggregated result.** As you've seen, SQLite is free to pick one row sometimes, and another row other times.
If I understand, you want to select the very worst rated restaurant(s) (lowest "stars"), but only of those restaurants with at least 400 reviews ("ratings").
Either of @JuanCarlosOropeza's approaches work: find MIN(stars)
across all records WHERE ratings >= 400
and then use that to select the name
of each business whose stars
equal that. (SQL analytic function, unsupported in SQLite at the time of this writing, offer more ways to compute the same.)
* As @harmic notes, the situation is a little more nuanced than that.
** As @CL. points out, later versions of SQLite do guarantee that the row selected does match the aggregate function. That is, the name
you get does actually have the MIN(stars)
rating.
Upvotes: 0
Reputation: 180060
SQLite 3.7.11 or later guarantees that the values of unaggregated columns come from a row that matches the min() or max().
However, unless you are using one of the latest versions, the query planner will optimize away the min()/max() values if they are not returned from the query.
You must list the min(stars)
in the outermost query.
Upvotes: 1
Reputation: 30587
In the SQLite docs, it says:
If the SELECT statement is an aggregate query without a GROUP BY clause, then each aggregate expression in the result-set is evaluated once across the entire dataset. Each non-aggregate expression in the result-set is evaluated once for an arbitrarily selected row of the dataset. The same arbitrarily selected row is used for each non-aggregate expression.
In other words, the actual name you are getting is purely random in this case, it does not bear any relation to the row that actually has the minimum value of stars
.
According to the SQL-92 standard, in a query containing aggregate expressions and non-aggregate expressions all of the non-aggregate expressions must appear in the GROUP BY clause. SQL-2003 has a similar but somewhat more complicated set of rules. This blog article provides a summary.
Some databases allow making queries that include aggregate and non-aggregate expressions without any GROUP BY clause, or without all non-aggregates included in the GROUP BY clause, but as you have discovered the results may be indeterminate. Other databases will display an error and refuse to run the query.
It is hard to give specific advice on how to correct your query because you have not stated what output you are trying to get. If you are trying to find out which row has the minimum value of stars
then one of the proposals in Juan's answer should work.
Upvotes: 2
Reputation: 48197
The problem is your MIN()
function bring the smallest value for stars
, but not the name match for that row.
You can do a cross join
SELECT name
FROM businesses b,
( SELECT min(stars) as MinValue
FROM businesses
WHERE ratings >= 400) as M
WHERE b.stars = M.MinValue;
OR Inner Select
SELECT name
FROM businesses b
WHERE b.stars = ( SELECT min(stars) as MinValue
FROM businesses
WHERE ratings >= 400);
Upvotes: 1