Reputation: 4103
I have a big data table that looks something like this
ID Marker Value1 Value2
================================
1 A 10 11
1 B 12 13
1 C 14 15
2 A 10 11
2 B 13 12
2 C
3 A 10 11
3 C 12 13
I want to search this data by the following data, which is user input and not stored in a table:
Marker Value1 Value2
==========================
A 10 11
B 12 13
C 14 14
The result should be something like this:
ID Marker Value1 Value2 Match?
==========================================
1 A 10 11 true
1 B 12 13 true
1 C 14 15 false
2 A 10 11 true
2 B 13 12 true
2 C false
3 A 10 11 true
3 C 12 13 false
And ultimately this (the above table is not necessary, it should demonstrate how these values came to be):
ID Matches Percent
========================
1 2 66%
2 2 66%
3 1 33%
I'm searching for the most promising approach to get this to work in SQL (PostgreSQL to be exact).
My ideas:
I'm not satisified with either approach, hence the question. How can I compare two tables like these efficiently?
Upvotes: 0
Views: 74
Reputation:
The user input can be supplied using a VALUES clause in a common table expression and that can then be used in a left join with the actual table.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)),
100 * count(*) filter (where (d.marker, d.value1, d.value2) is not distinct from (u.marker, u.value1, u.value2)) / cast(count(*) as numeric) as pct
from data d
left join user_input u on (d.marker, d.value1, d.value2) = (u.marker, u.value1, u.value2)
group by d.id
order by d.id;
Returns:
id | count | pct
---+-------+------
1 | 2 | 66.67
2 | 2 | 66.67
3 | 1 | 50.00
Online example: https://rextester.com/OBOOD9042
Edit
If the order of the values isn't relevant (so (12,13) is considered the same as (13,12) then the comparison gets a bit more complicated.
with user_input (marker, value1, value2) as (
values
('A', 10, 11),
('B', 12, 13),
('C', 14, 14)
)
select d.id,
count(*) filter (where (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) is not distinct from (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2)))
from data d
left join user_input u on (d.marker, least(d.value1, d.value2), greatest(d.value1, d.value2)) = (u.marker, least(u.value1, u.value2), greatest(u.value1, u.value2))
group by d.id
order by d.id;
Upvotes: 1
Reputation: 1307
Try this:
CREATE TABLE #Temp
(
Marker nvarchar(50),
Value1 nvarchar(50),
Value2 nvarchar(50)
)
INSERT INTO #Temp Values ('A', '10', '11')
INSERT INTO #Temp Values ('B', '12', '13')
INSERT INTO #Temp Values ('C', '14', '14')
SELECT m.Id, m.Marker, m.Value1, m.Value2,
(Select
CASE
WHEN COUNT(*) = 0 THEN 'False'
WHEN COUNT(*) <> 0 THEN 'True'
END
FROM #Temp t
WHERE t.Marker = m.Marker and t.Value1 = m.Value1 and t.Value2 = m.Value2) as Matches
FROM [Test].[dbo].[Markers] m
ORDER BY Matches DESC
Drop TABLE #Temp
If it's exactly what you want, I try to solve the second part of it.
Upvotes: 1
Reputation: 48865
You can use a CTE to pre-compute the matches. Then a simple aggregation will do the trick. Assuming your parameters are:
Marker Value1 Value2
==========================
m1 x1 y1
m2 x2 y2
m3 x3 y3
You can do:
with x as (
select
id,
case when
marker = :m1 and (value1 = :x1 and value2 = :y1 or value1 = :y1 and value2 = :x1)
or marker = :m2 and (value1 = :x2 and value2 = :y2 or value1 = :y2 and value2 = :x2)
or marker = :m3 and (value1 = :x3 and value2 = :y3 or value1 = :y3 and value2 = :x3)
then 1 else 0 end as matches
from t
)
select
id,
sum(matches) as matches,
100.0 * sum(matches) / count(*) as percent
from x
group by id
Upvotes: 1