Reputation: 507
DB schema
CREATE TABLE newsletter_status
(
cryptid varchar(255) NOT NULL,
status varchar(25),
regDat timestamp,
confirmDat timestamp,
updateDat timestamp,
deleteDat timestamp
);
There are rows with the same cryptid
, I need to squash them to one row. So the cryptid
becomes effectively unique. The complexity comes from the fact that I need to compare dates by rows as well as by columns. How to implement that?
The rule I need to use is:
Example:
002bc5 | new | 2010.01.15 | 2001.01.15 | NULL | 2020.01.10
002bc5 | confirmed | NULL | 2020.01.30 | 2020.01.15 | 2020.01.15
002bc5 | deactivated | NULL | NULL | NULL | 2020.12.03
needs to be squashed into:
002bc5 | deactivated | 2010.01.15 | 2020.01.30 | 2020.01.15 | 2020.12.03
The status deactivated
is taken because the timestamp 2020.12.03 is the latest
Upvotes: 0
Views: 75
Reputation: 6751
What you need to get the status is to sort rowset by dates in descending order. In Oracle there is agg_func(<arg>) keep (dense_rank first ...)
, in other databases it can be replaced with row_number()
and filter. Because analytic functions in HANA works not so good sometimes, I suggest to use the only one aggregate function I know in HANA that supports ordering inside - STRING_AGG
- with little trick. If you have not a thousands of rows with statuses (i.e. concatenated status will not be greater 4000 for varchar), it will work. This is the query:
select
cryptid,
max(regDat) as regDat,
max(confirmDat) as confirmDat,
max(updateDat) as updateDat,
max(deleteDat) as deleteDat,
substr_before(
string_agg(status, '|'
order by greatest(
ifnull(regDat, date '1000-01-01'),
ifnull(confirmDat, date '1000-01-01'),
ifnull(updateDat, date '1000-01-01'),
ifnull(deleteDat, date '1000-01-01')
) desc),
'|'
) as status
from newsletter_status
group by cryptid
Upvotes: 1
Reputation: 222482
I would start by ranking the rows of each cryptid
by the greatest value of the date column. Then we can use that information to identify the latest status per cryptid
, and aggregate :
select cryptid,
max(case when rn = 1 then status end) as status,
max(regDate) as regDat,
max(confirmDat) as confirmDat,
max(updatedDat) as updatedDat,
max(deleteDat) as deleteDat
from (
select ns.*,
row_number() over(
partition by cryptid
order by greatest(
coalesce(regDate, '0001-01-01'),
coalesce(confirmDat, '0001-01-01'),
coalesce(updatedDat, '0001-01-01'),
coalesce(deleteDat, '0001-01-01')
)
) rn
from newsletter_status ns
) ns
group by cryptid
Upvotes: 1
Reputation: 1269843
You can use aggregation:
select cryptid,
coalesce(max(case when status = 'deactivated' then status end)
max(case when status = 'confirmed' then status end),
max(case when status = 'new' then status end),
) as status,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from newsletter_status
group by cryptid;
The coalesce()
s are a trick to get the statuses in priority order.
EDIT:
If you just want the row with the latest timestamp:
select cryptid,
max(case when seqnum = 1 then status end) as status_on_max_date,
max(regDat),
max(confirmDat),
max(updateDat),
max(deleteDat)
from (select ns.*,
row_number() over (partition by cryptid
order by greatest(coalesce(regDat, '2000-01-01'),
coalesce(confirmDat, '2000-01-01'),
coalesce(updateDat, '2000-01-01'),
coalesce(deleteDat, '2000-01-01')
)
) as seqnum
from newsletter_status ns
) ns
group by cryptid;
Upvotes: 1