John
John

Reputation: 521

SQL select with COUNT(*) > minimum number

I have two tables. One stores "locations":

TABLE location (
    ID               serial             PRIMARY KEY,
    name             text               NOT NULL,
    description      text               NOT NULL
);

Each location has many rows of "data":

TABLE data(
    ID               smallint           REFERENCES location(ID),
    date             date,               
    rainfall         int                
);

I would like to find all locations that have "data" spanning a given period, and which have at least "min" number of values within that period. I have tried this:

SELECT location.ID, location.name 
FROM location 
JOIN data 
ON data.id = location.id 
GROUP BY location.id 
HAVING MIN(data.date) <= '$start_date' 
       AND
       MAX(data.date) >= '$end_date'  
       AND 
       (SELECT COUNT(*) FROM data WHERE data.date >= '$start_date' AND data.date <= '$end_date') >= '$min' 
ORDER BY location.ID

If I take out the second last line (the SELECT COUNT line) it correctly returns locations that have data spanning the desired period (but doesn't have the constraint that the number of values within the desired period is greater than or equal to "min").

Can anyone tell me how to impose the constraint? ie. what is wrong with my "SELECT COUNT line".

The following example data may help clarify my problem:

Example data:

location:
ID = 1, name = "London",   description = "test location 1"
ID = 2, name = "New York", description = "test location 2"

data:

ID = 1, date = 2001-01-01, rainfall = 0.0
ID = 1, date = 2001-01-02, rainfall = 0.0
ID = 1, date = 2001-01-03, rainfall = 0.0
ID = 1, date = 2001-01-04, rainfall = 0.0
ID = 1, date = 2001-01-05, rainfall = 0.0
ID = 1, date = 2001-01-06, rainfall = 0.0
ID = 1, date = 2001-01-07, rainfall = 0.0

ID = 2, date = 2001-01-01, rainfall = 0.0
ID = 2, date = 2001-01-04, rainfall = 0.0
ID = 2, date = 2001-01-05, rainfall = 0.0
ID = 2, date = 2017-01-01, rainfall = 0.0  # Not within the desired period, so is excluded
ID = 2, date = 2017-01-02, rainfall = 0.0  # Not within the desired period, so is excluded
ID = 2, date = 2017-01-03, rainfall = 0.0  # Not within the desired period, so is excluded
ID = 2, date = 2017-01-04, rainfall = 0.0  # Not within the desired period, so is excluded

If I search for all locations with data between 2001-01-01 and 2001-01-07, and which have at least 6 data values, it should only return location 1 (ID=1). The second location (ID=2) should not be returned as it does not have the required number of values within the desired period.

Upvotes: 1

Views: 874

Answers (2)

Joakim Danielson
Joakim Danielson

Reputation: 51971

This is a new version of my answer since I had misunderstood 'spanning'. My interpretation of the question is still though that when counting values we should only do it within the given period

SELECT l.id, l.name 
FROM location l
JOIN location_data d 
ON l.id = d.id 
GROUP BY l.id, l.name
HAVING MIN(d.date) <= '2017-01-01' 
   AND MAX(d.date) >= '2017-12-31'  
   AND (SELECT COUNT(b.id)
        FROM location_data b
        WHERE b.date BETWEEN '2017-01-01' AND '2017-12-31' 
        AND b.id = l.id) >= 2

I renamed the table data to location_data in my test db but that might be obvious :)

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269933

First, you should be passing in values such as dates as parameters, not as strings. Second, COUNT() returns a number, so the comparison should be to a number, not a string.

You should be able to do what you want with group by and having. One method is:

SELECT l.ID, l.name 
FROM location l JOIN
     data d
     ON d.id = l.id 
GROUP BY l.id, l.name
HAVING MIN(d.date) <= '$start_date' AND
       MAX(d.date) >= '$end_date' AND
       COUNT(*) >= $min
ORDER BY l.ID;

Upvotes: 1

Related Questions