user6048670
user6048670

Reputation: 2887

Best way to design a thesaurus in a SQL database?

I was asked in a job interview to design the database for a dictionary that keeps tracks of synonyms. The synonym part was a later question in the interview. I had a table like

    Words
=============
id |  ...
=============
 1 |   .
 2 |   .
 . |   .
 . |   .

and so to relate words to words I made a table

       Synonyms 
======================
 wordA_id | wordB_id
======================
     .    |    .
     .    |    . 
     .    |    .

For example, if words with id equal to 1, 69 and 72 are synonyms then

     1    |    69
     69   |    72

might be an entry in Synonyms. But so might

     1    |    69
     1    |    72

I realized. And the pairing order doesn't matter, so

     69   |   1
     1    |   72

words just as well. And we probably don't want duplicate records of any order, so something like

     69   |   1
     1    |   72
     69   |   1

of

     69   |   1
     1    |   72
     1    |   69

I'm wondering the best way to configure the database to ensure all this. What I suggested (in the interview) is inserting synynoms through a sproc that checks all this, like

CREATE PROCEDURE InsertSynonym
   @idA INT,
   @idB INT
AS
   -- make @idA the smaller if it isn't already
   IF @idA > @idB 
   BEGIN
      @DECLARE @temp INT
      SET @temp = @idA
      @idA = @idB
      @idB = @temp
   END
   -- insert if not already there
   IF NOT EXISTS (SELECT * FROM Synonyms WHERE wordA_id=@idA)
   BEGIN
      INSERT INTO Synonyms (wordA_id,wordB_id) VALUES (@idA,@idB)
   END

Upvotes: 1

Views: 1561

Answers (2)

Zohar Peled
Zohar Peled

Reputation: 82474

There is one bug I see but it's not in syntax, it's in logic. The IF NOT EXISTS should contain in it's where clause both wordA_Id and wordB_Id.
Also, it's not clear from your description of the database, but the words in words table should have a unique index on them, as well as the combination of both columns in the Synonyms table.

I'm not sure if a better database design exists, but there is another option and that's using a self-reference in the words table. However, this would require the self-reference column to be nullable, and while it is perfectly supported by all major databases, it's frowned upon.
To answer your question about xor swap - it is possible in t-sql:

DECLARE @A int = 5, @B int = 7;

SELECT @A as A, @B as B

SELECT  @A = @A ^ @B,
        @B = @A ^ @B,
        @A = @A ^ @B

SELECT @A as A, @B as B

results:

A           B
----------- -----------
5           7


A           B
----------- -----------
7           5

Upvotes: 1

Eralper
Eralper

Reputation: 6612

Maybe you can add a new field on Word table named SynonymId referencing a new table named Synonyms

In this Synonyms table you can keep SynonymId (non-unique), WordId

So when you want to get the list of synonyms of a word, read the field SynonymId of that row

Query Synonyms with same SynonymId joining Words over WordId

Here is how to select synonyms list can be queried with table structure and sample data

create table words(wordid int, word varchar(100), synonymid int)
create table synonyms(synonymid int, wordid int)
insert into words values (1, 'synonym', 1), (2, 'equivalence', 1), (3, 'opposite', null), (4, 'sameness', 1), (5, 'similarity',1 )
insert into synonyms values(1,1),(1,2),(1,3),(1,4)


declare @word varchar(100) = 'sameness'

SELECT
@word as word,  STUFF(
    (
select
    ',' + sw.word
from words w
left join [synonyms] s on s.synonymid = w.synonymid
left join words sw on s.wordid = sw.wordid
where
    w.word = @word
FOR XML PATH('') 
), 1, 1, ''
  ) As synonyms

Here is how to concatenate strings in SQL representing synonyms

Upvotes: 0

Related Questions