Jon Warren
Jon Warren

Reputation: 857

Trying to simplify a SQL query without UNION

I'm very bad at explaining, so let me try to lay out my issue. I have a table that resembles the following:

 Source    Value    User
========  =======  ======
  old1       1      Phil
  new        2      Phil
  old2       3      Phil
  new        4      Phil
  old1       1      Mike
  old2       2      Mike
  new        1      Jeff
  new        2      Jeff

What I need to do is create a query that gets values for users based on the source and the value. It should follow this rule:

For every user, get the highest value. However, disregard the 'new' source if either 'old1' or 'old2' exists for that user.

So based on those rules, my query should return the following from this table:

 Value    User
=======  ======
   3      Phil
   2      Mike
   2      Jeff

I've come up with a query that does close to what is asked:

SELECT      MAX([Value]), [User]
FROM
(
    SELECT  CASE [Source]
                WHEN 'old1' THEN 1
                WHEN 'old2' THEN 1
                WHEN 'new'  THEN 2
            END                 AS [SourcePriority],
            [Value],
            [User]
    FROM    #UserValues
) MainPriority
WHERE       [SourcePriority] = 1
GROUP BY    [User]
UNION
SELECT      MAX([Value]), [User]
FROM
(
    SELECT  CASE [Source]
                WHEN 'old1' THEN 1
                WHEN 'old2' THEN 1
                WHEN 'new'  THEN 2
            END                 AS [SourcePriority],
            [Value],
            [User]
    FROM    #UserValues
) SecondaryPriority
WHERE       [SourcePriority] = 2
GROUP BY    [User]

However this returns the following results:

 Value    User
=======  ======
   3      Phil
   4      Phil
   2      Mike
   2      Jeff

Obviously that extra value for Phil=4 is not desired. How should I attempt to fix this query? I also understand that this is a pretty convoluted solution and that it can probably be more easily solved by proper use of aggregates, however I'm not too familiar with aggregates yet which resulted in me resorting to a union. Essentially I'm looking for help creating the cleanest-looking solution possible.

Here is the SQL code if anyone wanted to populate the table themselves to give it a try:

CREATE TABLE #UserValues
(
    [Source] VARCHAR(10),
    [Value]  INT,
    [User]   VARCHAR(10)
)
INSERT INTO #UserValues VALUES
('old1', 1, 'Phil'),
('new',  2, 'Phil'),
('old2', 3, 'Phil'),
('new',  4, 'Phil'),
('old1', 1, 'Mike'),
('old2', 2, 'Mike'),
('new',  1, 'Jeff'),
('new',  2, 'Jeff')

Upvotes: 4

Views: 122

Answers (5)

Gordon Linoff
Gordon Linoff

Reputation: 1269873

I think you want:

select top (1) with ties uv.*
from (select uv.*,
             sum(case when source in ('old1', 'old2') then 1 else 0 end) over (partition by user) as cnt_old
      from #UserValues uv
     ) uv
where cnt_old = 0 or source <> 'new'
order by row_number() over (partition by user order by value desc);

Upvotes: 0

D&#225;vid Laczk&#243;
D&#225;vid Laczk&#243;

Reputation: 1101

I think you should consider setting up an XREF table to define which source is what priority, for a possible more complicated priorisation in the future. I do it with a temp table:

CREATE TABLE #SourcePriority
(
    [Source]         VARCHAR(10),
    [SourcePriority] INT
)
INSERT INTO #SourcePriority VALUES
('old1', 1),
('old2', 1), 
('new',  2)

You might also create a View to look up the SourcePriority to the original table. I do it wit a CTE + possible implementation how to look up the top priority with the highest value:

;WITH CTE as (
    SELECT s.[SourcePriority], u.[Value], u.[User]
    FROM   #UserValues as u
        INNER JOIN #SourcePriority as s on u.[Source] = s.[Source]
)
SELECT MAX (v.[Value]) as [Value], v.[User]
FROM (
    SELECT MIN ([SourcePriority]) as [TopPriority], [User]
    FROM   cte
    GROUP BY [User]
    ) as s
    INNER JOIN cte as v
        ON s.[User] = v.[User] and s.[TopPriority] = v.[SourcePriority]
GROUP BY v.[User]

Upvotes: 1

George Joseph
George Joseph

Reputation: 5922

with raw_data
      as (
    select row_number() over(partition by a.[user] order by a.value desc) as rnk
          ,count(case when a.source in('old1','old2') then 1 end) over(partition by a.[user]) as cnt_old 
          ,a.*
      from uservalues a
         )
        ,curated_data  
         as(select *
                  ,row_number() over(partition by rd.[user] order by rd.value desc) as rnk2
             from raw_data rd
            where 0 = case when rnk=1 and source='new' and cnt_old>0 then 1 else 0 end 
           )
    select *
      from curated_data
     where rnk2=1

I am doing the following

  1. raw_data ->First i get rank the values on the basis of max available value per user. Also i get to check if the user has any records which are pegged at old1 or old2 in the source column

  2. curated_data ->i eliminate records which have the highest value(rnk=1) as new if they have cnt_old >0. Also now i rank(rnk2) the records on the highest value available from this result set.

  3. I select the highest available value from curated_data(ie rnk2=1)

Upvotes: 1

Shawn Pence
Shawn Pence

Reputation: 181

You can solve it fairly easily without resorting to window functions. In this case, you need the maximum value where ((not new) OR (there isn't an old1 or old2 entry)).

Here's a query that works correctly with your sample data:

SELECT
    MAX(U1.[Value]) as 'Value'
    ,U1.[User]
FROM
    #UserValues U1
WHERE
    U1.[Source] <> 'new' 
    OR NOT EXISTS (SELECT * FROM #UserValues U2 WHERE U2.[User] = U1.[User] AND U2.[Source] IN ('old1','old2'))
GROUP BY U1.[User]

Upvotes: 2

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

You can use priorities order by with row_number() :

select top (1) with ties uv.*
from #UserValues uv
order by row_number() over (partition by [user] 
                            order by (case when source = 'old2' then 1 when source = 'old1' then 2 else 3 end), value desc 
                           );

However, if you have only source limited with 3 then you can also do :

. . . 
order by row_number() over (partition by [user] 
                            order by (case when source = 'new' then 2 else 1 end), value desc 
                           )

Upvotes: 1

Related Questions