Reputation: 6813
Basically, I have the following query (actually more complex, but I think this simplification is ok):
SELECT a, b, x
FROM table
output:
a | b | x
-----------
1 | 2 | 34
1 | 3 | 35
1 | 3 | 36
1 | 4 | 37
2 | 3 | 38
2 | 3 | 39
2 | 4 | 40
3 | 4 | 41
3 | 5 | 42
To count the number of occurrence of each "pair of a and b", I'm using GROUP BY:
SELECT a, b, COUNT(x) AS count
FROM table
GROUP BY a, b
ORDER BY count
output:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1
2 | 4 | 1
3 | 4 | 1
3 | 5 | 1
1 | 3 | 2
2 | 3 | 2
What bothers me is the multiple occurence of a and b. I would like to keep the "count" as it is, but remove every following row, if a or b was already in a previous row. It would be a nice to have, if it would also remove a row, if the value of "a" appeared in a previous row as "b" and vice versa.
Preferred expected output:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1 <- should not be in output since we had a=1
2 | 4 | 1 <- should not be in output since we had b=
3 | 4 | 1
3 | 5 | 1 <- should not be in output since we had a=3
1 | 3 | 2 <- should not be in output since we had a=1 / a=3
2 | 3 | 2 <- should not be in output since we had b=2 / a=3
Therefore, this:
a | b | count
--------------
1 | 2 | 1
3 | 4 | 1
Alternative expected output, if the above would be too complex:
a | b | count
--------------
1 | 2 | 1
1 | 4 | 1 <- should not be in output since we had a=1
2 | 4 | 1
3 | 4 | 1 <- should not be in output since we had b=4
3 | 5 | 1
1 | 3 | 2 <- should not be in output since we had a=1
2 | 3 | 2 <- should not be in output since we had a=2
Therefore, this:
a | b | count
--------------
1 | 2 | 1
2 | 4 | 1
3 | 5 | 1
Upvotes: 1
Views: 104
Reputation: 134
This query will give you the desired output.
DECLARE @id INT = 1,
@a INT,
@b INT,
@count INT
DECLARE @tbl TABLE
(
id INT IDENTITY(1,1),
a INT,
b INT,
count INT
)
INSERT INTO @tbl
SELECT a, b, COUNT(1) AS COUNT FROM dbo.myTable
GROUP BY a, b
ORDER BY COUNT,a,b
SELECT @count = COUNT(1) FROM @tbl
WHILE @id <= @count
BEGIN
SELECT TOP 1 @a = a,@b = b FROM @tbl WHERE id = @id
IF EXISTS(SELECT 1 FROM @tbl WHERE id < @id AND (a = @a OR b = @b))
DELETE @tbl WHERE id = @id
SET @id += 1
END
SELECT a,b,count FROM @tbl
Upvotes: 0
Reputation: 180058
This is rather a mess of a problem, but here's something to consider:
SELECT a, b, count
FROM (
SELECT a, b, count,
rank() over (partition by b order by count, a) as b_rank
FROM (
SELECT a, b, count,
rank() over (partition by a order by count, b) as a_rank
FROM (
SELECT a, b, COUNT(*) AS count
FROM t
GROUP BY a, b
ORDER BY count
) pc
) pc2
WHERE a_rank < 3
) pc3
WHERE b_rank = 1
Each a
value will appear at most twice in the results, but b
values will be unique. Some b
values appearing in low-count pairs may not be reflected in the results. There is a trade-off between possible duplication of a
and the number of b
values that may be missed altogether: allowing more duplicates of a
(by changing to, e.g, WHERE a_rank < 4
) reduces the number of b
values that may be missed.
Upvotes: 2