Reputation: 1
I have a large table (20,000,000+ rows) of this format:
id ref feature val
1 a pos pc
2 a gen m
3 a syll Cv
4 b pos pc
5 b L5 harry
6 c syll Cv
7 d gen m
How do I select a list of refs that satisfy multiple pairs of feature+val?
E.g. Select distinct ref where feature='pos' and val = 'pc'
, gives a,b
Select distinct ref where feature = 'gen' and val= 'm'
gives a,d
but how do I return the combined refs a,b and d, for those refs having feature='pos' and val = 'pc'
with those refs having feature = 'gen' and val= 'm'
?
I need to be able to extend the solution to many such pairs in the same search.
The search needs to be fast, so advice on indexes would be helpful as well.
Should I select from the same table N times, one for each feature+val pair? And if so, how?
Upvotes: 0
Views: 242
Reputation: 11130
sqlite> CREATE TABLE t (id INT, ref VARCHAR, feature VARCHAR, val VARCHAR);
sqlite>
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (1, "a", "pos", "pc");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (2, "a", "gen", "m");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (3, "a", "syll", "Cv");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (4, "b", "pos", "pc");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (5, "b", "L5", "harry");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (6, "c", "syll", "Cv");
sqlite> INSERT INTO t (id, ref, feature, val) VALUES (7, "d", "gen", "m");
sqlite>
sqlite> SELECT DISTINCT ref FROM t WHERE (feature = 'pos' AND val = 'pc') OR (feature = 'gen' and val= 'm');
a
b
d
sqlite>
the running time should be linear, 20 million rows may seem big, but it shouldn't much of an issue with todays hardware and most implementations, indexing agains the filter types might improve performance, making sure your rows are fixed width may also have slight improvement.
If you need more configurations simply add them using OR
well you get the idea, I would recommend testing it, and seeing how long the query takes, I seriously doubt it would be to long, though I don't know the system nor the implementation you are using, so I can't really say much.
Upvotes: 0
Reputation: 58441
Typically, you would use an OR
statement to add those additional clauses
SELECT DISTINCT ref
FROM YourTable
WHERE (feature = 'pos' AND val = 'pc')
OR (feature = 'gen' AND val = 'm')
...
You might be able to simplify this a bit by using a persistent calculated column on concatenating both feature
and val
.
SELECT DISTINCT ref
FROM YourTable
WHERE (featureval = 'pospc')
OR (featureval = 'genm')
but the chaining of OR
statements remain.
Indexes
Each statements performance has to be evaluated by looking at the execution plan but for given statement, a Covering Index on (feature, val, ref)
wouldn't hurt.
Upvotes: 0
Reputation: 6034
You can select combinations like that using 'OR':
Select distinct ref
from table
where ( feature='pos' and val = 'pc')
or ( feature = 'gen' and val= 'm' )
If that's going to get too unwieldy, you probably want to move your feature / val combinations into a different table and do a join on that.
If you have a join table called join_table (now there's an inspired name!) with feature and val columns, you could do something like this:
select distinct ref
from table t
inner join join_table j on t.feature = j.feature and t.val = j.val
Upvotes: 1