Reputation: 457
This seems straight forward, but I am having trouble figuring it out. I have a table, where I am comparing records within a group (group_id). I gave my records a flag based on certain criteria that was pertinent to my study. There are about 10 flags that I need to look at. In some cases, within the group, only 1 record is flagged. In other cases, both records are flagged.
What I would like to do now is partition my group_id based on dates for each flag designation. I have 3 time periods I'd like to look at: 2016, 2010 -2015, before 2010, and both records are the same. These are reflected in my 'file_date' field which are not set up as a date type data. instead, they have an abcsum (summer) and abcwin (winter) designation, with the year. In some cases this field is null.
What I would like to do is create another flag that the 'winner' of the partition would take based on which one of those date criteria was fit. The importance of the criteria is from most recent to least recent. Here is some sample data I hope will explain it further:
------------------------------
|group_id | file_date | flag |
-------------------------------
| a | abcsum16 | 3 |
| a | abcwin16 | |
| b | null | 4 |
| b | abcsum15 | 4 |
| c | abcwin16 | 7 |
| c | abcwin16 | 7 |
| d | abcsum09 | 3 |
| d | null | |
-------------------------------
The output would be:
|group_id | file_date | flag | date_flag |
--------------------------------------------
| a | abcsum16 | 3 | 1 |
| b | abcsum15 | 4 | 2 |
| c | abcwin16 | 7 | 0 |
| c | abcwin16 | 7 | 0 |
| d | abcsum09 | 3 | 3 |
-------------------------------------------
Upvotes: 0
Views: 155
Reputation:
Here is one way to do it. Please note, however, that it would (likely) be much better to use these techniques earlier in your process, if possible. For example, if file_date
is not stored that way in your base table, but instead it is the result of some processing, it would be better to use the base data instead of the file_date
expression. Also perhaps the analytic functions can be used sooner so you do fewer passes through the data (assuming, of course, that performance is important; sometimes it isn't).
with
test_data ( group_id, file_date, flag ) as (
select 'a', 'abcsum16', 3 from dual union all
select 'a', 'abcwin16', null from dual union all
select 'b', null , 4 from dual union all
select 'b', 'abcsum15', 4 from dual union all
select 'c', 'abcwin16', 7 from dual union all
select 'c', 'abcwin16', 7 from dual union all
select 'd', 'abcsum09', 3 from dual union all
select 'd', null , null from dual
)
-- end of test data (not part of the SQL query); query begins BELOW THIS LINE
select group_id, file_date, flag,
case when count(*) over (partition by group_id) = 2
and
count(distinct file_date) over (partition by group_id) = 1 then 0
when to_number(substr(file_date, -2)) = 16 then 1
when to_number(substr(file_date, -2)) between 10 and 15 then 2
else 3 end
as date_flag
from test_data
where file_date is not null and flag is not null
;
GROUP_ID FILE_DATE FLAG DATE_FLAG
-------- --------- ---- ----------
a abcsum16 3 1
b abcsum15 4 2
c abcwin16 7 0
c abcwin16 7 0
d abcsum09 3 3
5 rows selected.
Upvotes: 1