Reputation: 596
To put this work in context... I'm trying to filter a database of objects and build descriptions which can be verbalized for a speech UI. To minimise the descriptions I want to find the shortest way to describe an object, based on the idea of Grices Maxims.
It's possible in code by iterating through the records, and running through all permutations, but I keep thinking there ought to be a way to do this in SQL... so far I haven't found it. (I'm using PostGRES.)
So I have a table that looks something like this:
id colour position height
(int) (text) (text) (int)
0 "red" "left" 9
1 "red" "middle" 8
2 "blue" "middle" 8
3 "blue" "middle" 9
4 "red" "left" 7
There are two things I wish to find based on the attributes (excluding the ID).
a) are any of the records unique, based on the minimum number of attributes? => e.g. record 0 is unique based on colour and height => e.g. record 1 is the only red item in the middle => e.g. record 4 is unique as its the only one which has a height of 7
b) how is a particular record unique?
=> e.g. how is record 0 unique? because it is the only item with a colour red, and height of 9 => e.g. record 4 is unique because it is the only item with a height of 7
It may of course be that no objects are unique based on the attributes which is fine.
+++++++++++++++++++++++++
Answer for (a)
So the only way I can think to do this in SQL is to start off by testing a single attribute to see if there is a single match from all records. If not then add attribute 2 and test again. Then try attributes 1 and 3. Finally try attributes 1,2 and 3.
Something like this:-
single column test:
select * from griceanmaxims
where height=(Select height from griceanmaxims
group by height
having (count(height)=1))
or
relpos=
(Select relpos
from griceanmaxims
group by relpos
having (count(relpos)=1))
or
colour=
(Select colour
from griceanmaxims
group by colour
having (count(colour)=1))
double column tests:
(Select colour,relpos
from griceanmaxims
group by colour,relpos
having (count(colour)=1))
(Select colour,height
from griceanmaxims
group by colour,height
having (count(colour)=1))
etc
++++++++
I'm not sure if there's a better way or how to join up the results from the double column tests.
Also if anyone has any suggestions on how to determine the distinguishing factors for a record (as in question b), that would be great. My guess is that (b) would require (a) to be run for all of the field combinations, but I'm not sure if there's a better way.
Thanks in advance for any help on this one....
Upvotes: 0
Views: 432
Reputation: 6626
I like the idea of attacking the problem using a General Purpose Language eg C#:
1) Iterate through and see if any have 1 attribute which is unique eg ID = 4, which is unique because height is 7. Take ID 4 out of the 'doing' collection, and put into 'done' collection with appropriate attribute
Use a unit testing tool eg MSUNIT to prove the above works
2) Try and extend to n attibutes
Unit Test
3) See if any can be unique with 2 attributes. Take those IDs out of doing and into done with the pairs of attributes
Unit Test
4) Extend to m attributes
Unit Test
3) Refactor maybe using recursion
Hope this helps.
Upvotes: 1