Roland P
Roland P

Reputation: 403

Eliminating duplicate rows with precedence

I am working on a stored procedure that is combining history with projections. I have a bit column (PHP) that specifies if the projection has precedence over the history or not. I also have a column that specifies if the data came from the history or projection table. The output of my stored procedure looks like this:

CaseId     Year     Projection    PHP    Gas   Oil
  1        2004         0          1    
  1        2005         0          1    
  1        2005         1          1    
  1        2006         1          1    
  1        2007         1          1    
  1        2008         1          1    
  1        2009         1          1    
  2        2003         0          0    
  2        2004         0          0    
  2        2005         0          0    
  2        2005         1          0    
  2        2006         1          0    
  2        2007         1          0    
  2        2008         1          0    
  2        2006         1          0    

In this example, I would need to eliminate the second row because for CaseId 1 projection has precedence so the overlapping history date should be removed. Also, the fourth row for CaseId 2 should be deleted because history has precedence.

CaseId     Year     Projection    PHP    Gas   Oil  
  1        2004         0          1    
  1        2005         1          1    
  1        2006         1          1    
  1        2007         1          1    
  1        2008         1          1    
  1        2009         1          1    
  2        2003         0          0    
  2        2004         0          0    
  2        2005         0          0    
  2        2006         1          0    
  2        2007         1          0    
  2        2008         1          0    
  2        2006         1          0

I need to flag duplicate years within a CaseId and then compare the Projection and PHP columns and delete the row where they don't match.

Here is the query I am working with:

SELECT      rcl.ReportRunCaseId AS CaseId, 
            year(rce.EcoDate) as Year,
            1 as Projection,
            cpq.ProjectionHasPrecedence as PHP,
            rce.GrossOil as Oil,                
            rce.GrossGas as Gas    
  from  phdreports.PhdRpt.ReportCaseList_28 rcl 
         inner join phdreports.PhdRpt.RptCaseEco_28 rce on
            rce.ReportRunCaseId = rcl.ReportRunCaseId
         inner join dbo.caseQualifier cq on 
            cq.CorpScenarioId = 1 and 
            cq.CaseCaseId = rcl.ReportRunCaseId and 
            cq.CorpQualifierTypeId = 1
         inner join dbo.caseProjectionQualifier cpq on 
            cpq.CaseCaseId = rcl.ReportRunCaseId and 
            cpq.CorpQualifierId = cq.QualifierHasData 
where rcl.ReportRunCaseId <=2
group by year(rce.EcoDate), rcl.ReportRunCaseId, cpq.ProjectionHasPrecedence, rce.GrossGas, rce.GrossOil

union all

select      rmp.ReportRunCaseId AS CaseId, 
            year(rmp.EcoDate) as Year,
            0 as Projection,
            cpq.ProjectionHasPrecedence as PHP,
            rmp.GrossOil as Oil,
            rmp.GrossGas as Gas              
from PhdReports.PhdRpt.RptMonthlyProduction_50 rmp
        inner join dbo.caseQualifier cq on 
          cq.CorpScenarioId = 1 and 
          cq.CaseCaseId = rmp.ReportRunCaseId and 
          cq.CorpQualifierTypeId = 1
        inner join dbo.caseProjectionQualifier cpq on 
          cpq.CaseCaseId = rmp.ReportRunCaseId and 
          cpq.CorpQualifierId = cq.QualifierHasData 
where rmp.ReportRunCaseId <= 2
group by year(rmp.EcoDate), rmp.ReportRunCaseId, cpq.ProjectionHasPrecedence, rmp.GrossGas, rmp.GrossOil 

How can I eliminate the duplicate years where Projection and PHP do not match?

Upvotes: 2

Views: 180

Answers (2)

GarethD
GarethD

Reputation: 69759

The ROW_NUMBER() Function should help you here:

WITH Data AS
(   SELECT      rcl.ReportRunCaseId AS CaseId, 
                year(rce.EcoDate) as Year,
                1 as Projection,
                cpq.ProjectionHasPrecedence as PHP,
                rce.GrossOil as Oil,                
                rce.GrossGas as Gas    
      from  phdreports.PhdRpt.ReportCaseList_28 rcl 
             inner join phdreports.PhdRpt.RptCaseEco_28 rce on
                rce.ReportRunCaseId = rcl.ReportRunCaseId
             inner join dbo.caseQualifier cq on 
                cq.CorpScenarioId = 1 and 
                cq.CaseCaseId = rcl.ReportRunCaseId and 
                cq.CorpQualifierTypeId = 1
             inner join dbo.caseProjectionQualifier cpq on 
                cpq.CaseCaseId = rcl.ReportRunCaseId and 
                cpq.CorpQualifierId = cq.QualifierHasData 
    where rcl.ReportRunCaseId <=2
    group by year(rce.EcoDate), rcl.ReportRunCaseId, cpq.ProjectionHasPrecedence, rce.GrossGas, rce.GrossOil

    union all

    select      rmp.ReportRunCaseId AS CaseId, 
                year(rmp.EcoDate) as Year,
                0 as Projection,
                cpq.ProjectionHasPrecedence as PHP,
                rmp.GrossOil as Oil,
                rmp.GrossGas as Gas              
    from PhdReports.PhdRpt.RptMonthlyProduction_50 rmp
            inner join dbo.caseQualifier cq on 
              cq.CorpScenarioId = 1 and 
              cq.CaseCaseId = rmp.ReportRunCaseId and 
              cq.CorpQualifierTypeId = 1
            inner join dbo.caseProjectionQualifier cpq on 
              cpq.CaseCaseId = rmp.ReportRunCaseId and 
              cpq.CorpQualifierId = cq.QualifierHasData 
    where rmp.ReportRunCaseId <= 2
    group by year(rmp.EcoDate), rmp.ReportRunCaseId, cpq.ProjectionHasPrecedence, rmp.GrossGas, rmp.GrossOil
), Data2 AS
(   SELECT  *, 
            RowNum = ROW_NUMBER() OVER(PARTITION BY CaseId, Year 
                                        ORDER BY CASE WHEN PHP = Projection THEN 0 ELSE 1 END DESC, PHP DESC, Projection DESC)
    FROM    Data
)
SELECT  CaseId, Year, Projection, PHP, Oil, Gas
FROM    Data2
WHERE   RowNum - 1;

Considering only the last bit, as the first is just your query inside a common table expression:

RowNum = ROW_NUMBER() OVER(PARTITION BY CaseId, Year 
                            ORDER BY CASE WHEN PHP = Projection THEN 0 ELSE 1 END DESC, PHP DESC, Projection DESC)

Here, we give each caseId, year tuple a rank, ordering by whether or not PHP is equal to projection. Then the final part just limits the result to the first row for each tuple, so if a row exists where they are equal it will take that, if there are no rows where they are equal a row where they are not equal will be used.

You may need to add further criteria to the order by to ensure the results are deterministic, i.e. if you have two rows for the same caseId/Year where both PHP and projection are 1 in both, ensure that the same row is picked every time.

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269663

I don't know what your query has to do with the question. So, let me assume that you have a query that does:

select CaseId, Year, Projection, PHP, Gas, Oil 
from t

With this, you can do what you want using row_number():

select CaseId, Year, Projection, PHP, Gas, Oil
from (select CaseId, Year, Projection, PHP, Gas, Oil,
             row_number() over (partition by CaseId, Year
                                order by Projection + PHP desc
                               ) as seqnum
      from t
     ) t
where seqnum = 1;

This will prioritize rows based on the number of flags that are set. In your example for CaseId = 2, the two rows contain the same values. This will return one of those rows. If you want to choose between them, you need another column so specify precedence.

Upvotes: 1

Related Questions