Reputation: 4346
This is a postgresql
problem.
PostgreSQL 8.3.3 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-9).
The table looks like:
date_time other_column
2012-11-01 00:00:00 ...
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-02 04:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...
2012-11-07 00:00:00 ...
2012-11-07 00:00:00 ...
...
I want to select at most 3 records per day from a specific date range.
For example, I want to select at most 3 records from 2012-11-02 to 2012-11-05.
The expected result
would be:
date_time other_column
2012-11-02 01:00:00 ...
2012-11-02 02:00:00 ...
2012-11-02 03:00:00 ...
2012-11-03 05:00:00 ...
2012-11-03 06:00:00 ...
2012-11-05 00:00:00 ...
I have spent a few hours on this and still cannot figure it out. Please help me. :(
UPDATE: The current sql I tried could only select one record per day:
SELECT DISTINCT ON (TO_DATE(SUBSTRING((date_time || '') FROM 1 FOR 10), 'YYYY-MM-DD')) *
FROM myTable
WHERE date_time >= '20121101 00:00:00'
AND date_time <= '20121130 23:59:59'
Upvotes: 3
Views: 1446
Reputation: 656401
I want to select at most 3 records per day from a specific date range.
SELECT date_time, other_column
FROM (
SELECT *, row_number() OVER (PARTITION BY date_time::date) AS rn
FROM tbl
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-12-01 0:0'
) x
WHERE rn < 4;
Use the window function row_number()
. rank()
or dense_rank()
would be wrong according to the question - more than 3 records might be selected with timestamp duplicates.
Since you do not define which rows you want per day, the correct answer is not to include an ORDER BY
clause in the window function. Gives you an arbitrary selection, which matches the question.
I changed your WHERE
clause from
WHERE date_time >= '20121101 00:00:00'
AND date_time <= '20121130 23:59:59'
to
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-12-01 0:0'
Your syntax would fail for corner cases like '20121130 23:59:59.123'
.
What @Craig suggested:
date_time::date BETWEEN '2012-11-02' AND '2012-11-05'
.. would work correctly, but is an anti-pattern regarding performance. If you apply a cast or a function to your database column in the expression, plain indexes cannot be used.
Best solution: Upgrade to a more recent version, preferably to the current version 9.2.
Other solutions:
For only few days you could employ UNION ALL
:
SELECT date_time, other_column
FROM tbl t1
WHERE date_time >= '2012-11-01 0:0'
AND date_time < '2012-11-02 0:0'
LIMIT 3
)
UNION ALL
(
SELECT date_time, other_column
FROM tbl t1
WHERE date_time >= '2012-11-02 0:0'
AND date_time < '2012-11-03 0:0'
LIMIT 3
)
...
Parenthesis are not optional here.
For more days there are workarounds with generate_series()
- something like I posted here (including a link to more).
I might have solved it with a plpgsql function back in the old days before we had window functions:
CREATE OR REPLACE FUNCTION x.f_foo (date, date, integer
, OUT date_time timestamp, OUT other_column text)
RETURNS SETOF record AS
$BODY$
DECLARE
_last_day date; -- remember last day
_ct integer := 1; -- count
BEGIN
FOR date_time, other_column IN
SELECT t.date_time, t.other_column
FROM tbl t
WHERE t.date_time >= $1::timestamp
AND t.date_time < ($2 + 1)::timestamp
ORDER BY t.date_time::date
LOOP
IF date_time::date = _last_day THEN
_ct := _ct + 1;
ELSE
_ct := 1;
END IF;
IF _ct <= $3 THEN
RETURN NEXT;
END IF;
_last_day := date_time::date;
END LOOP;
END;
$BODY$ LANGUAGE plpgsql STABLE STRICT;
COMMENT ON FUNCTION f_foo(date3, date, integer) IS 'Return n rows per day
$1 .. date_from (incl.)
$2 .. date_to (incl.)
$3 .. maximim rows per day';
Call:
SELECT * FROM f_foo('2012-11-01', '2012-11-05', 3);
Upvotes: 3
Reputation: 324375
The following answers all use date_trunc('day',date_time)
or just cast to date
to truncate a timestamp to a date. There's no need to jump through hoops with date formatting and strings. See Date/time functions in the manual.
This SQLFiddle shows three possible answers: http://sqlfiddle.com/#!12/0fd51/14, all of which produce the same result for the input data (but not necessarily the same result if date_time
can have duplicates in it).
To solve your problem you could use a correlated subquery with a limit to generate an IN-list to filter on:
SELECT a.date_time, a.other_column
FROM table1 a
WHERE a.date_time IN (
SELECT b.date_time
FROM table1 b
WHERE b.date_time IS NOT NULL
AND a.date_time::date = b.date_time::date
ORDER BY b.date_time
LIMIT 3
)
AND a.date_time::date BETWEEN '2012-11-02' AND '2012-11-05';
This should be the most portable approach - though it won't work with MySQL (at least as of 5.5) because MySQL doesn't support LIMIT
in a subquery used in an IN
clause. It works in SQLite3 and PostgreSQL, though, and should work in most other DBs.
Another option would be to select the range of dates you wanted, annotate the rows within the range with a row number using a window function, then filter the output to exclude excess rows:
SELECT date_time, other_column
FROM (
SELECT
date_time,
other_column,
rank() OVER (PARTITION BY date_trunc('day',date_time) ORDER BY date_time) AS n
FROM Table1
WHERE date_trunc('day',date_time) BETWEEN '2012-11-02' AND '2012-11-05'
ORDER BY date_time
) numbered_rows
WHERE n < 4;
If ties are a possibility, ie if date_time
is not unique, then consider using either the rank
or dense_rank
window functions instead of row_number
to get deterministic results, or add an additional clause to the ORDER BY
in row_number
to break the tie.
If you use rank
then it'll include none of the rows if it can't fit all of them in; if you use dense_rank
it'll include all of them even if it has to go over the 3-row-per-day limit to do so.
All sorts of other processing are possible this way too, using the window specification.
Here's yet another formulation that uses array aggregation and slicing, which is completely PostgreSQL specific but fun.
SELECT b.date_time, b.other_column
FROM (
SELECT array_agg(a.date_time ORDER BY a.date_time)
FROM table1 a
WHERE a.date_time::date BETWEEN '2012-11-02'
AND '2012-11-05'
GROUP BY a.date_time::date
) x(arr)
INNER JOIN table1 b ON (b.date_time = ANY (arr[1:3]));
Upvotes: 3
Reputation: 520
I would use a sub-select and a left outer join. This should do the trick:
select distinct(date_format(a.date_time,"%Y-%m-%d")) date_time, b.* from table a
left outer join (
select date_format(date_time,"%Y-%m-%d") dt, * from table limit 3
) b
on date_format(a.date_time,"%Y-%m-%d") = b.dt;
Upvotes: -3