Reputation: 5028
I have a table like this:
+-------+--------------+--------------+-------------+-------------+-------------+-------------+
| Study | Point_number | Date_created | condition_A | condition_B | condition_C | condition D |
+-------+--------------+--------------+-------------+-------------+-------------+-------------+
| 1 | 1 | 01-01-2001 | 1 | 1 | 0 | 1 |
| 1 | 2 | 01-01-2001 | 0 | 1 | 1 | 0 |
| 1 | 3 | 01-01-2001 | 0 | 1 | 0 | 0 |
+-------+--------------+--------------+-------------+-------------+-------------+-------------+
The condition_A, B, C and D are used to classify data points into groups. So each unique combination of those columns would be a group. For each group, I want to retrieve the last 200 rows.
At the moment I have something like this:
select * from my_table where point_number <= 200;
In order to do this for each group I could do:
select * from my_table where point_number <= 200 where condition_A = 1 and condition_B = 1 and condition_C = 1 and condition D = 1
union all
select * from my_table where point_number <= 200 where condition_A = 1 and condition_B = 1 and condition_C = 1 and condition D = 0
union all...;
The problem with this approach is that there are many, many combinations, and it would be good to make the query as flexible as possible. How can I avoid doing UNION ALLs and have the query automatically retrieve 200 rows for each group?
Upvotes: 2
Views: 213
Reputation: 1269803
Your original query:
select *
from my_table
where point_number <= 200;
Should do what you want -- retrieve values of point_number
less than 200. It should do this for each group.
If you want 200 values in each group, then something like this might be what you really want:
select t.*
from (select t.*,
row_number() over (partition by a, b, c, d order by point_number desc) as seqnum
from my_table
) t
where seqnum <= 200;
This assumes that point_number()
is increasing and larger values are "more recent". You might want to use date_created
in the order by
rather than point_number
.
Upvotes: 3
Reputation: 23578
This should help you work out what you need to do:
with sample_data as (select 1 id, 1 ca, 0 cb from dual union all
select 2 id, 1 ca, 1 cb from dual union all
select 3 id, 1 ca, 1 cb from dual union all
select 4 id, 0 ca, 0 cb from dual union all
select 5 id, 0 ca, 1 cb from dual union all
select 6 id, 0 ca, 1 cb from dual union all
select 7 id, 0 ca, 0 cb from dual union all
select 8 id, 1 ca, 0 cb from dual union all
select 9 id, 1 ca, 1 cb from dual union all
select 10 id, 0 ca, 1 cb from dual union all
select 11 id, 0 ca, 0 cb from dual union all
select 12 id, 1 ca, 0 cb from dual union all
select 13 id, 1 ca, 0 cb from dual union all
select 14 id, 0 ca, 1 cb from dual union all
select 15 id, 0 ca, 0 cb from dual union all
select 16 id, 1 ca, 1 cb from dual union all
select 17 id, 0 ca, 0 cb from dual)
select id,
ca,
cb,
row_number() over (partition by ca, cb order by id) rn
from sample_data;
ID CA CB RN
---------- ---------- ---------- ----------
4 0 0 1
7 0 0 2
11 0 0 3
15 0 0 4
17 0 0 5
5 0 1 1
6 0 1 2
10 0 1 3
14 0 1 4
1 1 0 1
8 1 0 2
12 1 0 3
13 1 0 4
2 1 1 1
3 1 1 2
9 1 1 3
16 1 1 4
Basically, you need to find out the row number of each row per each group - a job for analytic functions, specifically the row_number()
analytic function.
If you've not come across analytic functions before, basically they're similar to aggregate functions (so you can find results across groups, aka "partition by") without collapsing the rows. I would recommend you do some research on this, if you aren't already familiar with them!
Anyway, once you've assigned your row numbers, you can then throw an outer query around the sql to filter on the row number, eg:
with sample_data as (select 1 id, 1 ca, 0 cb from dual union all
select 2 id, 1 ca, 1 cb from dual union all
select 3 id, 1 ca, 1 cb from dual union all
select 4 id, 0 ca, 0 cb from dual union all
select 5 id, 0 ca, 1 cb from dual union all
select 6 id, 0 ca, 1 cb from dual union all
select 7 id, 0 ca, 0 cb from dual union all
select 8 id, 1 ca, 0 cb from dual union all
select 9 id, 1 ca, 1 cb from dual union all
select 10 id, 0 ca, 1 cb from dual union all
select 11 id, 0 ca, 0 cb from dual union all
select 12 id, 1 ca, 0 cb from dual union all
select 13 id, 1 ca, 0 cb from dual union all
select 14 id, 0 ca, 1 cb from dual union all
select 15 id, 0 ca, 0 cb from dual union all
select 16 id, 1 ca, 1 cb from dual union all
select 17 id, 0 ca, 0 cb from dual),
results as (select id,
ca,
cb,
row_number() over (partition by ca, cb order by id) rn
from sample_data)
select *
from results
where rn <= 3;
ID CA CB RN
---------- ---------- ---------- ----------
4 0 0 1
7 0 0 2
11 0 0 3
5 0 1 1
6 0 1 2
10 0 1 3
1 1 0 1
8 1 0 2
12 1 0 3
2 1 1 1
3 1 1 2
9 1 1 3
Upvotes: 1
Reputation: 2505
How about this:
SELECT *
FROM my_table
WHERE point_number <= 200
AND condition_A = 1
AND condition_B = 1
AND condition_C = 1
AND condition_D IN (0, 1);
Upvotes: 1