Return only first distinct row from each join

Question

The scenario is simple. I have 4 tables, A -table, B -table, C1 -table and C2 -table. A is a root level table, B references A, and C1 and C2 reference B. But each B.ID can only be referenced by either C1 or C2, never both. The results are exported to a .CSV -file which is then used for a variety of purposes, and the question here has to do with readability as well as making it easier to manage the information in external software.

I wrote a query that returns all data in all 4 tables keeping the relations intact, ordering them by A, B, C1 and C2.

SELECT A.*, B.*, C1.*, C2.*
FROM A
JOIN B 
LEFT JOIN C1
LEFT JOIN C2
ORDER BY A.ID, B.ID, etc.

And got this:

A.ID | B.ID | C1.ID | C2.ID
    1|     1|      1|  NULL
    1|     1|      2|  NULL
    1|     2|      1|  NULL
    1|     2|      2|  NULL
    1|     2|      3|  NULL
    2|     1|   NULL|     1
    2|     1|   NULL|     2
....

Now, the question here is this: How do I return only the first distinct row for each join, so that the resultset doesn't get clogged with redundant data. Basically, the result above should produce this:

A.ID | B.ID | C1.ID | C2.ID
    1|     1|      1|  NULL
     |      |      2|  NULL
     |     2|      1|  NULL
     |      |      2|  NULL
     |      |      3|  NULL
    2|     1|   NULL|     1
     |      |   NULL|     2
....

I can probably do this by making each join a subquery and partitioning the results by rank, or alternatively creating a temporary table and slam the results there with the required logic, but since this will be used in a console app, I'd like to keep the solution as clean, simple and optimized as possible.

Any ideas?

podiluska · Accepted Answer

This is reporting / formatting, not data, so it should be handled by the application, not by SQL.

That said, this will produce something close to your requirements

select 
    case arn when 1 then convert(varchar(10),aid) else '' end as aid,
    case brn when 1 then convert(varchar(10),bid) else '' end as bid,
    case crn when 1 then convert(varchar(10),c1id) else '' end as c1id,
    c2id        
from
(    
           select a.id aid, b.id bid, c1.id c1id, c2.id c2id,
    ROW_NUMBER() over(partition by a.id order by a.id,b.id,c1.id,c2.id) arn,
    ROW_NUMBER() over(partition by a.id,b.id order by a.id,b.id,c1.id,c2.id) brn,
    ROW_NUMBER() over(partition by a.id,b.id,c1.id order by a.id,b.id,c1.id,c2.id) crn
           FROM A 
           JOIN B  
           LEFT JOIN C1 
           LEFT JOIN C2 

) v

Return only first distinct row from each join

Answers (1)

Related Questions