John F
John F

Reputation: 1072

SQL - LAG to get previous value if condition using multiple previous columns satisfied

I have a table created by:

CREATE TABLE #test_table 
(
id INT
,EventName VARCHAR(50)
,HomeTeam VARCHAR(25)
,Metric INT
)

INSERT INTO #test_table VALUES
(1, 'Team A vs Team B', 'Team A', 5),
(2, 'Team A vs Team B', 'Team A', 7),
(3, 'Team C vs Team D', 'Team C', 6),
(4, 'Team Z vs Team A', 'Team Z', 8),
(5, 'Team A vs Team B', 'Team A', 9),
(6, 'Team C vs Team D', 'Team C', 3),
(7, 'Team C vs Team D', 'Team C', 1),
(8, 'Team E vs Team F', 'Team E', 2)

Which results in:

id  EventName           HomeTeam    Metric
------------------------------------------
1   Team A vs Team B    Team A      5
2   Team A vs Team B    Team A      7
3   Team C vs Team D    Team C      6
4   Team Z vs Team A    Team Z      8
5   Team A vs Team B    Team A      9
6   Team C vs Team D    Team C      3
7   Team C vs Team D    Team C      1
8   Team E vs Team F    Team E      2

A want to calculate a new column PreviousMetricN where N can be 1, 2, 3, ... which shows the previous value for Metric, but only if the HomeTeam was involved in the previous event. For example:

id  EventName           HomeTeam    Metric  PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1   Team A vs Team B    Team A      5       NULL            NULL
2   Team A vs Team B    Team A      7       5               NULL
3   Team C vs Team D    Team C      6       NULL            NULL
4   Team Z vs Team A    Team Z      8       NULL            NULL
5   Team A vs Team B    Team A      9       8               7
6   Team C vs Team D    Team C      3       6               NULL
7   Team C vs Team D    Team C      1       3               6
8   Team E vs Team F    Team E      2       NULL            NULL

I have been trying variations of LAG with a new grouping variable in the PARTITION BY clause such as

LAG(Metric) OVER(Partition by (CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN 1 ELSE 0 END) ORDER BY id)

but without any success. How can this be done?

EDIT: I've also asked this question for Pandas here: Pandas shift - get previous value if multiple conditions satisfied

Upvotes: 4

Views: 2939

Answers (5)

Charlieface
Charlieface

Reputation: 71805

I see no answer here that uses window functions and a single scan of the table. We can do this query in a single scan as follows:

Let us assume you have the AwayTeam in another column.

If you don't have this yet and you wanted to parse it out of EventData:
We could use: SUBSTRING(EventData, CHARINDEX(' vs ', EventData) + 4)
I urge you to follow proper normalization and create this as a proper column in your table.

Our algorithm runs like this:

  1. Multiply out (unpivot) the two teams as separate rows, using CROSS APPLY
  2. Calculate the previous Metrics using LAG, partitioning by the merged Team column
  3. Filter back down the doubled up rows, so that we only get a single row for each of our original ones
SELECT id, HomeTeam, AwayTeam, Metric, Prev1, Prev2, Prev3
FROM (

  SELECT *
    ,Prev1 = LAG(Metric, 1) OVER (PARTITION BY v.Team ORDER BY id)
    ,Prev2 = LAG(Metric, 2) OVER (PARTITION BY v.Team ORDER BY id)
    ,Prev3 = LAG(Metric, 3) OVER (PARTITION BY v.Team ORDER BY id)
    -- more of these ......
  FROM test_table
  CROSS APPLY (VALUES (HomeTeam, 1),(AwayTeam, 0)) AS v(Team,IsHome)
) AS t

WHERE IsHome = 1
-- ORDER BY id  --if necessary

Importantly, we can do this without the use of multiple different sorts, partitions or ordering, and without the use of a self-join. Just a single scan.

Result:

id HomeTeam AwayTeam Metric Prev1 Prev2 Prev3
1 Team A Team B 5 (null) (null) (null)
2 Team A Team B 7 5 (null) (null)
3 Team C Team D 6 (null) (null) (null)
4 Team Z Team A 8 (null) (null) (null)
5 Team A Team B 9 8 7 5
6 Team C Team D 3 6 (null) (null)
7 Team C Team D 1 3 6 (null)
8 Team E Team F 2 (null) (null) (null)

Upvotes: 3

Tom
Tom

Reputation: 17

I believe this is what you are looking for:

;with cte as (
   select id
 , eventname
 , hometeam
 , metric
 , CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(Metric) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous from  #test_table 
)
select * ,CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN LAG(previous) OVER (Partition by HomeTeam ORDER BY id) ELSE NULL END previous2 
from cte
order by 1

Upvotes: 0

First through self join and common table expression I have ranked all the previous eventname containing hometeam. We can have PreviousMetric1 from previous match and we can use Lead() window function to obtain PreviousMetric2. Please check below query:

with cte as(
select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,   
LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
from #test_table a
    left join #test_table b
        on charindex(a.hometeam,b.eventname)>0  and a.id>b.id
)select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2  from cte 
where rownum=1

enter image description here

You can also have PreviousMetric3 applying Lead() with 2 as second parameter. In this way you can have any number of previous metric. It's faster comparing to any other approach.

;with cte as(
    select a.id,a.eventname,a.hometeam,a.metric,b.metric PreviousMetric1,   
    LEAD(b.metric)over (partition by a.id order by b.id desc) PreviousMetric2,
    LEAD(b.metric,2)over (partition by a.id order by b.id desc) PreviousMetric3,
    row_number()over(partition by a.id,a.hometeam order by b.id desc) rownum
    from #test_table a
        left join #test_table b
            on charindex(a.hometeam,b.eventname)>0  and a.id>b.id
    )select id,eventname,hometeam,metric,PreviousMetric1,PreviousMetric2 ,PreviousMetric3 from cte 
where rownum=1

enter image description here

Upvotes: 1

Lukasz Szozda
Lukasz Szozda

Reputation: 175806

Using OUTER APPLY and correlated subqueries:

SELECT *
FROM test_table c
OUTER APPLY (SELECT TOP 1 PreviousMetric1 = c2.Metric  
             FROM test_table c2 
             WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0 
               AND c.id > c2.id
             ORDER BY id DESC) s1
OUTER APPLY (SELECT PreviousMetric2 = c2.Metric  
             FROM test_table c2 
             WHERE CHARINDEX(c.HomeTeam, c2.EventName)>0 
               AND c.id > c2.id
             ORDER BY id DESC OFFSET 1 ROWS FETCH NEXT 1 ROW ONLY) s2            
ORDER BY id;

db<>fiddle demo

Output:

+-----+-------------------+-----------+---------+------------------+-----------------+
| id  |    EventName      | HomeTeam  | Metric  | PreviousMetric1  | PreviousMetric2 |
+-----+-------------------+-----------+---------+------------------+-----------------+
|  1  | Team A vs Team B  | Team A    |      5  |                  |                 |
|  2  | Team A vs Team B  | Team A    |      7  |               5  |                 |
|  3  | Team C vs Team D  | Team C    |      6  |                  |                 |
|  4  | Team Z vs Team A  | Team Z    |      8  |                  |                 |
|  5  | Team A vs Team B  | Team A    |      9  |               8  |               7 |
|  6  | Team C vs Team D  | Team C    |      3  |               6  |                 |
|  7  | Team C vs Team D  | Team C    |      1  |               3  |               6 |
|  8  | Team E vs Team F  | Team E    |      2  |                  |                 |
+-----+-------------------+-----------+---------+------------------+-----------------+

Extedning with PreviousMetricN is a matter of adding corresponding OUTER APPLY sN with OFFSET N-1 ROWS FETCH ....

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270081

The logic seems to be:

lag(metric, <n>) over (partition by hometeam order by id)

I don't see why eventName is needed.

Upvotes: 1

Related Questions