SF Developer
SF Developer

Reputation: 5384

SQL Server - Selecting list of keywords and synonymous

I have two tables:

Keywords

where I store unique keywords.

CREATE TABLE [dbo].[Keywords]
[KeywordID] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](200) NOT NULL

select * from Keywords   

  1 MVC  
  2 HTML  
  3 C#  
  4 ASP.NET MVC  
  5 MVC3

KeywordSynonymous

where I point certain keywords to be synonymous of others.

CREATE TABLE [dbo].[KeywordSynonymous]
    [KeywordID] [int] NOT NULL,
    [KeywordSynonymousID] [int] NOT NULL

Both fields are FK to the Keywords table and both combined fields are used as PK on this table.

In here I would to state that 'MVC' and 'MVC3' are synonymous, and maybe that 'MVC3' and 'ASP.NET MVC' are also synonymous.

select * from KeywordSynonymous  

1 5  
5 4  

CONCEPTS

1)

If Keyword 'MVC' is synonymous of 'MVC3'
and 'MVC3' is synonymous of 'ASP.NET MVC'

then conceptually MVC is ALSO synonymous of 'ASP.NET MVC'

2)

If Keyword 'MVC' is synonymous of 'MVC3'

then it's also true the VICEVERSA and that 'MVC3 is synonymous of 'MVC'

QUESTION

Imagine on my website I'm doing a search and the user could type anything, but for our example he could type 'MVC' or 'MVC3' ...

How can I get with one SQL statement ALL the possible synonymous ensuring that both Concept 1 and 2 are met?

Meaning that:

>> if the user types 'MVC',        my sql should return 'MVC, MVC3', 'ASP.NET MVC'.  
>> if the user types 'MVC3',       my sql should return 'MVC, MVC3', 'ASP.NET MVC'.  
>> if the user types 'ASP.NETMVC', my sql should return 'MVC, MVC3', 'ASP.NET MVC'.  

================================================================
UPDATE
I feel I have to add a bit about the website i am developing. It's a market place where young professionals will be able to sell their services using new ways to promote themselves.

Since we want to allow any profession, I cannot foresee at this time what "keywords" will define better each profession. So I will allow the users to define those keywords.

My problem is that I need to allow UserX to search for these young professionals by profession and keywords. I need to allow these users to match their searched keywords to existing keywords so that current and future searches will automatically match the right profiles.

That's why i don't have all the keywords in advance and surely cannot identify future keywords and their respective synonymous. I also cannot expect the users to match ALL existing keywords to all relevant ones ...so that's why i need Concept 1 working.

================================================================
STACKOVERFLOW TAGS
The Keyword's module should work very similar to StackOverflow Tags (Keywords) where if I set the TAGS to be SQL, you guys that are searching for TSQL or SQL SERVER ...should also see this post.

:-)

Upvotes: 1

Views: 852

Answers (5)

SF Developer
SF Developer

Reputation: 5384

OK, so how about this one:

DECLARE @TempKeywordID TABLE (KeywordID int)
INSERT INTO @TempKeywordID (KeywordID)(select KeywordID from Keywords where [Description] = @SearchKeyword)

DECLARE @intFlag INT
SET @intFlag = 1

WHILE (@intFlag <=(Select Count(KeywordSynonymousID) from KeywordSynonymous)) --Loop for all records in KeywordSynonymous
BEGIN
    INSERT INTO @TempKeywordID (KeywordID)(Select KeywordSynonymousID from KeywordSynonymous where KeywordID in (Select KeywordID from @TempKeywordID))
    INSERT INTO @TempKeywordID (KeywordID)(Select KeywordID from KeywordSynonymous where KeywordSynonymousID in (Select KeywordID from @TempKeywordID))    

    SET @intFlag = @intFlag + 1
END

SELECT * FROM Keywords WHERE KeywordID IN (SELECT * FROM @TempKeywordID)

Upvotes: 0

Jose Rui Santos
Jose Rui Santos

Reputation: 15319

You should definitely use Common Table Expressions. This is ideal solution for your problem, because it does not change your current DB schema and most importantly, CTE is an elegant and the logical solution due to the recursiveness that your KeywordSynonymous table has.

To accomplish this, its better to first create a view that selects all rows in KeywordSynonymous in both directions. In your case, this table returns the rows

select * from KeywordSynonymous  

1 5  
5 4  

and what the view below will do is to show

select * from KeywordSynonymousAll   

1 5      0   
2 NULL   0
3 NULL   0
4 NULL   0
4 5      1
5 1      1
5 4      0

This view is the data structure that will simplify the recursive query. It adds a third column to identify when an reversion has been made. This is required to satisfy your concept number 2.

So, here it is the view:

create view KeywordSynonymousAll as
    select KeywordID, KeywordSynonymousID, 0 as reversed
      from KeywordSynonymous
     union
    select K.KeywordID, null as KeywordSynonymousID, 0 as reversed
      from Keywords K
     where not exists(select null
                        from KeywordSynonymous
                       where KeywordID = K.KeywordID)
     union
     select KeywordSynonymousID, KeywordID, 1 as reversed
       from KeywordSynonymous

And the query

declare @search varchar(200);

set @search = 'MVC3'; -- TEST HERE for different search keywords

with Synonymous (keywordID, SynKeywordID) as ( 

    -- initial state: Get the keywordId and KeywordSynonymousID for the description as @search
    select K.keywordID, KS.KeywordSynonymousID
      from Keywords K
     inner join KeywordSynonymous KS on KS.KeywordID = K.keywordId
     where K.Description = @search

    union all

    -- also initial state but with reversed columns (because we want lookup in both directions)
    select KS.KeywordSynonymousID, K.keywordID
      from Keywords K
     inner join KeywordSynonymous KS on KS.KeywordSynonymousID = K.keywordId
     where K.Description = @search

     union all

    select S.SynKeywordID, KS.KeywordSynonymousID
      from Synonymous S
     inner join KeywordSynonymousAll KS on KS.KeywordID = S.SynKeywordID
     where KS.reversed = 0 -- to avoid infinite recursion

     union all

    select KS.KeywordSynonymousID, S.SynKeywordID 
      from Synonymous S
     inner join KeywordSynonymousAll KS on KS.KeywordID = S.KeywordID
     where KS.reversed = 1 -- to avoid infinite recursion

) 

-- finally output the result
select distinct K.Description
  from Synonymous S
 inner join Keywords K on K.KeywordID = S.keywordID

For set @search = 'MVC3', the result set is

  ASP.NET MVC
  MVC
  MVC3

The same result set happens for set @search = 'MVC' and set @search = 'ASP.NET MVC'

For set @search = 'C#' and set @search = 'HTML' you get nothing

EDIT

In my previous post, I said that result set would be empty for C# and HTML. If you want also to return those values, then change the last part of query to:

-- finally output the result
select distinct T.Description
  from (
    select K.Description
      from Synonymous S
     inner join Keywords K on K.KeywordID = S.keywordID

    union

    select Description
      from Keywords
     where Description = @search) T

Now, for set @search = 'C#', the result set is

  C#

and for set @search = 'HTML', the result set is

  HTML

Hope this helps

Upvotes: 3

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115530

Beacuse of your conditions (concepts), the Synonymous table is not normalized. This is main source of your problem and the complex queries/triggers needed to solve it.

I would keep the Keyword table:

CREATE TABLE [dbo].[Keywords]
[KeywordID] [int] IDENTITY(1,1) NOT NULL,
[Description] [varchar](200) NOT NULL

select * from Keywords   

  1 MVC  
  2 HTML  
  3 C#  
  4 ASP.NET MVC  
  5 MVC3
  6 C sharp

and make the Synonymous table differently:

CREATE TABLE [dbo].[KeywordSynonymity]
    [SynonymityID] [int] NOT NULL,
    [KeywordID] [int] NOT NULL

select * from KeywordSynonymous  

1 1               --- for the 1 (MVC) and 5 (MVC3)
1 5               --- being synonymous
2 3               --- for the 3 (C#) and 6 (C sharp)
2 6               --- being synonymous

Then to add that MVC3 and ASP.NET MVC are also synonymous, you'll just have to add a row (1,4) in the Synonymity table.

If then - for unknown reasons but lets assume nonetheless that - you want to combine MVC3 and C# as synonymous, you'll have to change all rows with SynonymityID=2 (synonymous to C#) to =1 (synonymous to MVC).

But all your queries will be simpler as the table is normalized.

Upvotes: 2

Andrew Savinykh
Andrew Savinykh

Reputation: 26280

1 is called Symmetric Relation and 2 is called Transitive Relation.

I suggest that you maintain them as you add new keywords. You can do it this way. When a keyword is added to database, if there is no synonym to it already, designate it "master" keyword. Otherwise link the new keyword to the existing master keyword.

Here is a stored procedure to add the new keywords this way:

CREATE PROCEDURE [dbo].[AddKeyword] 
    @newKeyword [varchar](200), 
    @synonymKeyword [varchar](200) = NULL
AS
BEGIN
    SET NOCOUNT ON;

    set transaction isolation level serializable

    begin transaction

        if EXISTS (select 1 from Keywords where [Description] = @newKeyword)
        begin
            commit transaction
            return
        end

        declare @masterKeywordId int

        select 
            @masterKeywordId = ISNULL(KeywordSynonymous.KeywordID, Keywords.KeywordID) 
        from
            Keywords
        left join
            KeywordSynonymous
        on
            Keywords.KeywordID = KeywordSynonymous.KeywordSynonymousID
        where
            [Description] = @synonymKeyword

        insert into Keywords VALUES (@newKeyword)

        if @masterKeywordId is not null
            insert into KeywordSynonymous VALUES (@masterKeywordId,SCOPE_IDENTITY())

    commit transaction

END

In this stored procedure you pass a new keyword to add and optionally you also pass a known synonym. This synonym does not have to be "master". If it's present it's "master" keyword id will be looked up, and the newly created keyword will be linked against that "master" id.

And this is how you select them all in the end:

CREATE PROCEDURE [dbo].[GetSynonymKeywords]
    @keyword [varchar](200)
AS
BEGIN
    SET NOCOUNT ON;

    declare @masterKeywordId int

    select 
        @masterKeywordId = ISNULL(KeywordSynonymous.KeywordID, Keywords.KeywordID) 
    from
        Keywords
    left join
        KeywordSynonymous
    on
        Keywords.KeywordID = KeywordSynonymous.KeywordSynonymousID
    where
        [Description] = @keyword

    select 
        KeywordId,[Description]
    from
        Keywords
    where
        KeywordId = @masterKeywordId
    union
    select 
        Keywords.KeywordId,[Description]
    from
        KeywordSynonymous
    join
        Keywords
    on
        KeywordSynonymous.KeywordSynonymousID = Keywords.KeywordId
    where
        KeywordSynonymous.KeywordId = @masterKeywordId

END

This stored procedure first finds the keyword ID given a passed keyword. Then it looks up the "master" keyword for this id. Then it returns the master keyword and all keywords that are synonyms to this master keyword.

Example of adding new words:

EXEC [dbo].[AddKeyword] @newKeyword = N'MVC'
EXEC [dbo].[AddKeyword] @newKeyword = N'ASP.NET MVC',   @synonymKeyword = 'MVC'
EXEC [dbo].[AddKeyword] @newKeyword = N'MVC3',  @synonymKeyword = 'ASP.NET MVC'

Note that in the third line you could have specified 'MVC' as the synonym, it would have worked just as well.

Example of retrieving keywords:

[dbo].[GetSynonymKeywords]  @keyword = N'MVC3'
[dbo].[GetSynonymKeywords]  @keyword = N'ASP.NET MVC'
[dbo].[GetSynonymKeywords]  @keyword = N'MVC3'

All three returns the same list of values.

I'm putting isolation level to serialized in the AddKeyword SP to make sure there is no concurrency problem feel free to modify it according your concurrency model, serialized might not be appropriate for you.

You also can pull out GetMasterId (the block that appears in both SPs) into an UDF if you so inclined, or do any other modification that fits your particular scenario.

Upvotes: 1

Ron Harlev
Ron Harlev

Reputation: 16673

To achieve at least #1 you could use recursive Common Table Expressions (CTE)
with the definition here

Upvotes: 2

Related Questions