Neaox
Neaox

Reputation: 1972

Inner Join one table to another table multiple times

Say I have three tables:

User Table
{
    UserId INT,
    Username NVARCHAR
    ...
}

Questions
{
    QuestionId INT
    QuestionText NVARCHAR
}

Answers
{
    AnswerId INT,
    QuestionId INT,
    UserId INT,
    Answer NVARCHAR
}

This structure is obviously overly simplified, but for the purpose of this example it should suffice.

What would be the best way to select users who have specific answers for specific questions, for instance - Assuming the tables are populated with the following data:

User Table

UserId              Username                ...
--------------------------------------------------------------------------------------------------------
1                   User1                   ... 
2                   User2                   ... 
3                   User3                   ...
4                   User4                   ...
5                   User5                   ...
6                   User6                   ...
7                   User7                   ...
8                   User8                   ...
9                   User9                   ...
10                  User10                  ...
...                 ...                     ...

etc

Questions Table

QuestionId              QuestionText
--------------------------------------------------------------------------------------------------------
1                       What is your favorite color?
2                       What do you prefer cats or dogs?
3                       Do you prefer if it is too hot or too cold?
4                       What is your favorite season (Summer, Autumn (Fall), Winter, Spring)?
5                       How Old Are you?
...                     ...

etc

Answers Table

AnswerId                QuestionId              UserId          Answer
--------------------------------------------------------------------------------------------------------
1                       1                       1               Red 
2                       1                       2               Red
3                       1                       3               Blue
4                       1                       4               Green
5                       1                       5               Black
6                       2                       6               Cats
7                       2                       1               Dogs
8                       3                       1               Too Cold
9                       4                       1               Spring
10                      5                       1               22
11                      2                       4               Dogs
12                      3                       4               Too Hot
13                      3                       3               Too Cold
14                      5                       6               46
15                      1                       8               Purple

If I wanted to select users who liked dogs and red or purple and under the age of 50 etc

Would the best (most efficient) way to do this be to have multiple joins from the user table to the answers table (one for each answer condition required)

For example:

If I were wanting to get the users who liked dogs and the color red I could use the following MSSQL:

SELECT * 
FROM 
Users 
JOIN Answers As a1 
ON Users.UserId = a1.UserId 
JOIN Answers as a2 
ON Users.UserId = a2.UserId 
WHERE 
    (
        a1.QuestionId = 1 AND 
        a1.Answer = 'Red'
    ) AND 
    (
        a2.QuestionId = 2 AND 
        a2.Answer = 'Dogs'
    )

There could be numerous answer conditions.

Basically the question I am asking is what is the best way to write a query where you have conditions on multiple rows from the same table with the same columns...

Sorry if this is confusing, feel free to ask any questions I will try and answer them as best as I can...

Thanks.

Upvotes: 3

Views: 6404

Answers (2)

Cristian Lupascu
Cristian Lupascu

Reputation: 40506

There's also the option of using the PIVOT operator.

That's how the query you proposed above could be written:

select UserId, UserName
from (
  select
    u.UserId,
    u.UserName,
    case 
      when a.QuestionId = 1 then 'Color'
      when a.QuestionId = 2 then 'Animal'
      when a.QuestionId = 3 then 'Temperature'
      when a.QuestionId = 4 then 'Season'
      when a.QuestionId = 5 then 'Age'
    end as Question,
    a.Answer
  from Users u
  join Answers a on a.UserId = u.UserId
) as SourceTable
pivot (
  max(Answer)
  for Question in (
    [Color], 
    [Animal], 
    [Temperature], 
    [Season], 
    [Age])
) as pivotTable

where Animal = 'Dogs'
  and Color in ('red', 'purple')
  and Age < 50

Here's a link to test this online: http://www.sqlfiddle.com/#!3/5c960/23

Yes, the query looks cumbersome, but you could write it once and (assuming the questions don't change that often) just alter the where clause, which is pretty easy to write/read/understand/maintain (see the last three lines in the code block above).

Update:

For performance analysis, compare these two queries:

(after executing the queries, click the "View Execution Plan" link above the results to see what SQL does under the covers)

What I recommend though is that you run these queries on a database of your own, where you may have already created appropriate indexes and where the volume of data could be enough to produce a relevant result.

I'm no SQL performance expert, but I have a hunch that Ann L.'s solution could be more efficient and scale better to large volumes of data. But that, again, it's just a hunch; if you can perform a test on your environment you could see the actual results.

Upvotes: 3

Ann L.
Ann L.

Reputation: 13965

Your basic query looks just fine. As you get more elaborate, you would construct your WHERE clauses slightly differently depending on how your conditions need to be combined.

For example, in the example you provided, of either red or purple being acceptable answers, you could construct the WHERE clause like this:

WHERE (a1.QuestionId = 1 AND (a1.Answer IN ('Red','Purple')) 
  AND (a2.QuestionId = 2 AND a2.Answer = 'Dogs')

It gets more complicated if only certain sets of answers are acceptable, so if 'Red' and 'Dogs' or 'Purple' and 'Cats' are acceptable, it would look more like this:

WHERE 
  (
          (a1.QuestionId = 1 AND a1.Answer = 'Red')
      AND (a2.QuestionId = 2 AND a2.Answer = 'Dogs')
  )
  OR
  (
          (a1.QuestionId = 1 AND a1.Answer = 'Purple')
      AND (a2.QuestionId = 2 AND a2.Answer = 'Cats')
  )

If your conditions get more complicated, you might want to read Dynamic Search Conditions in T-SQL. While your conditions aren't dynamic, there's a lot of useful information there.

Finally, since it's easy to get confused with what question ID goes with what answers, particularly if they aren't nice, human-recognizable values, it can help to use CTEs to pre-select the answers:

WITH Colors
AS   (
    SELECT *
    FROM   Answers
    WHERE  QuestionID = 1
)
,    Animals
AS   (
    SELECT *
    FROM   Answers
    WHERE  QuestionID = 2
)
SELECT   *
FROM     Users 
   JOIN  Colors
       ON  Users.UserID = Colors.UserID
   JOIN  Animals
       ON  Users.UserID = Animals.UserID
WHERE   (
        Colors.Answer = 'Red'
    AND Animals.Answer = 'Dogs'
    )
    OR  (
        Colors.Answer = 'Purple'
    AND Animals.Answer = 'Cats'
    )

Upvotes: 3

Related Questions