Reputation: 431
I have a statement that SELECTs distinct rows from a table with a two value index
SELECT distinct Reference, Value1, Value2, Value3, Value4 FROM [tblHistory]
Where Reference is an index with another field "Project". For a particular system this data is inserted into another table using only Reference as the index because Value1 through Value4 SHOULD always be the same for the same Reference - however in about 1/500 it is not.
In the case where there is a duplicate Reference AND differences in one or more of the Value1-Value4 fields I need to pick the row with the most completed Value1-Value4 fields as they are often NULL. If all instances have the same number of populated columnsI can return the first row found.
Other than using temporary tables and code like
case when Value1 is null then 1 else 0 end
+ case when Value2 is null then 1 else 0 end
+ case when Value3 is null then 1 else 0 end
+ case when Value4 is null then 1 else 0 end
as CountOfNulls
Is there a way to filter the data so I only get the most populated row?
I'm running MS SQL Server 2000.
Upvotes: 3
Views: 2634
Reputation: 431
Well it's not a great way to do it, but I have put together this code and it works:
SELECT distinct Reference, Value1, Value2, Value3, Value4
FROM [tblHistory]
WHERE Reference+cast(4-(case when Value1 is null then 1 else 0 end
+ case when Value2 is null then 1 else 0 end
+ case when Value3 is null then 1 else 0 end
+ case when Value4 is null then 1 else 0 END) AS varchar) IN (
SELECT myref + CAST(MAX(CountOfNonNulls) AS VARCHAR) FROM
(
SELECT myref, 4-(case when Value1 is null then 1 else 0 end
+ case when Value2 is null then 1 else 0 end
+ case when Value3 is null then 1 else 0 end
+ case when Value4 is null then 1 else 0 end)
as CountOfNonNulls
FROM [tblHistory]
)l
GROUP BY Reference
)
I don't actually have this table structure so I haven't tested it but it seems to work. The idea is to make a "new" key by adding the highest CountOfNonNulls to the Reference field and using that to limit the select - it means the nasty CASE code is run twice but the other filters I have (not shown) limit the population to roughly 80 rows in my system so I can live with this.
I have yet to see what it would do if there were two rows with the same CountOfNonNulls value but different Value1-Value4 fields - I think it would break. In that case I would probably add the Value1-Value4 fields to my "new" key but that's a bit silly.
Any suggestions of improvement would be much appreciated!
Upvotes: 0
Reputation: 11022
You could look into the Coalesce function but honestly I would probably do the Case statement as you have above.
What reason would you not want to use it?
According to the comments, the table is more than just 4 value fields. But the fear is that a temporary table keeping the highest count of nulls will be necessary.
I feel that the Case solution, potentially implemented into a view is still a viable and good solution.
Upvotes: 1
Reputation: 1325
-- count() will not include NULL, so we can avoid making complex conditions
;
with
sum_cnt
(
Reference,
cnt
)
as
(
select
Reference,
count(Value1) + count(Value2) + count(Value3) + count(Value4)
from
tblHistory
group by
Reference
)
select top 1
Reference
from
sum_cnt
order by
cnt desc
Upvotes: 1
Reputation: 3368
This is just a hunch, since I haven't seen your database, but it looks like this is one table that should actually be two. As I said, this may have been done for performance or other legit reasons, but the way you need to report over it suggests this table should be split.
Nonetheless, if I had no other alternative, I'd create a column with the count of filled in columns for the row, and programmatically update it whenever the particular record is updated.
Upvotes: 1