Reputation: 857
I'm very bad at explaining, so let me try to lay out my issue. I have a table that resembles the following:
Source Value User
======== ======= ======
old1 1 Phil
new 2 Phil
old2 3 Phil
new 4 Phil
old1 1 Mike
old2 2 Mike
new 1 Jeff
new 2 Jeff
What I need to do is create a query that gets values for users based on the source and the value. It should follow this rule:
For every user, get the highest value. However, disregard the 'new' source if either 'old1' or 'old2' exists for that user.
So based on those rules, my query should return the following from this table:
Value User
======= ======
3 Phil
2 Mike
2 Jeff
I've come up with a query that does close to what is asked:
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) MainPriority
WHERE [SourcePriority] = 1
GROUP BY [User]
UNION
SELECT MAX([Value]), [User]
FROM
(
SELECT CASE [Source]
WHEN 'old1' THEN 1
WHEN 'old2' THEN 1
WHEN 'new' THEN 2
END AS [SourcePriority],
[Value],
[User]
FROM #UserValues
) SecondaryPriority
WHERE [SourcePriority] = 2
GROUP BY [User]
However this returns the following results:
Value User
======= ======
3 Phil
4 Phil
2 Mike
2 Jeff
Obviously that extra value for Phil=4 is not desired. How should I attempt to fix this query? I also understand that this is a pretty convoluted solution and that it can probably be more easily solved by proper use of aggregates, however I'm not too familiar with aggregates yet which resulted in me resorting to a union. Essentially I'm looking for help creating the cleanest-looking solution possible.
Here is the SQL code if anyone wanted to populate the table themselves to give it a try:
CREATE TABLE #UserValues
(
[Source] VARCHAR(10),
[Value] INT,
[User] VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new', 2, 'Phil'),
('old2', 3, 'Phil'),
('new', 4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new', 1, 'Jeff'),
('new', 2, 'Jeff')
Upvotes: 4
Views: 122
Reputation: 1269873
I think you want:
select top (1) with ties uv.*
from (select uv.*,
sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
from #UserValues uv
) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);
Upvotes: 0
Reputation: 1101
I think you should consider setting up an XREF table to define which source is what priority, for a possible more complicated priorisation in the future. I do it with a temp table:
CREATE TABLE #SourcePriority
(
[Source] VARCHAR(10),
[SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1),
('new', 2)
You might also create a View to look up the SourcePriority to the original table. I do it wit a CTE + possible implementation how to look up the top priority with the highest value:
;WITH CTE as (
SELECT s.[SourcePriority], u.[Value], u.[User]
FROM #UserValues as u
INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
SELECT MIN ([SourcePriority]) as [TopPriority], [User]
FROM cte
GROUP BY [User]
) as s
INNER JOIN cte as v
ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]
Upvotes: 1
Reputation: 5922
with raw_data
as (
select row_number() over(partition by a.[user] order by a.value desc) as rnk
,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old
,a.*
from uservalues a
)
,curated_data
as(select *
,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
from raw_data rd
where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end
)
select *
from curated_data
where rnk2=1
I am doing the following
raw_data ->First i get rank the values on the basis of max available value per user. Also i get to check if the user has any records which are pegged at old1 or old2 in the source column
curated_data ->i eliminate records which have the highest value(rnk=1) as new if they have cnt_old >0. Also now i rank(rnk2) the records on the highest value available from this result set.
I select the highest available value from curated_data(ie rnk2=1)
Upvotes: 1
Reputation: 181
You can solve it fairly easily without resorting to window functions. In this case, you need the maximum value where ((not new) OR (there isn't an old1 or old2 entry)).
Here's a query that works correctly with your sample data:
SELECT
MAX(U1.[Value]) as 'Value'
,U1.[User]
FROM
#UserValues U1
WHERE
U1.[Source] <> 'new'
OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]
Upvotes: 2
Reputation: 50163
You can use priorities order by
with row_number()
:
select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user]
order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc
);
However, if you have only source
limited with 3 then you can also do :
. . .
order by row_number() over (partition by [user]
order by (case when source = 'new' then 2 else 1 end), value desc
)
Upvotes: 1