Reputation: 1972
Say I have three tables:
User Table
{
UserId INT,
Username NVARCHAR
...
}
Questions
{
QuestionId INT
QuestionText NVARCHAR
}
Answers
{
AnswerId INT,
QuestionId INT,
UserId INT,
Answer NVARCHAR
}
This structure is obviously overly simplified, but for the purpose of this example it should suffice.
What would be the best way to select users who have specific answers for specific questions, for instance - Assuming the tables are populated with the following data:
User Table
UserId Username ...
--------------------------------------------------------------------------------------------------------
1 User1 ...
2 User2 ...
3 User3 ...
4 User4 ...
5 User5 ...
6 User6 ...
7 User7 ...
8 User8 ...
9 User9 ...
10 User10 ...
... ... ...
etc
Questions Table
QuestionId QuestionText
--------------------------------------------------------------------------------------------------------
1 What is your favorite color?
2 What do you prefer cats or dogs?
3 Do you prefer if it is too hot or too cold?
4 What is your favorite season (Summer, Autumn (Fall), Winter, Spring)?
5 How Old Are you?
... ...
etc
Answers Table
AnswerId QuestionId UserId Answer
--------------------------------------------------------------------------------------------------------
1 1 1 Red
2 1 2 Red
3 1 3 Blue
4 1 4 Green
5 1 5 Black
6 2 6 Cats
7 2 1 Dogs
8 3 1 Too Cold
9 4 1 Spring
10 5 1 22
11 2 4 Dogs
12 3 4 Too Hot
13 3 3 Too Cold
14 5 6 46
15 1 8 Purple
If I wanted to select users who liked dogs and red or purple and under the age of 50 etc
Would the best (most efficient) way to do this be to have multiple joins from the user table to the answers table (one for each answer condition required)
For example:
If I were wanting to get the users who liked dogs and the color red I could use the following MSSQL:
SELECT *
FROM
Users
JOIN Answers As a1
ON Users.UserId = a1.UserId
JOIN Answers as a2
ON Users.UserId = a2.UserId
WHERE
(
a1.QuestionId = 1 AND
a1.Answer = 'Red'
) AND
(
a2.QuestionId = 2 AND
a2.Answer = 'Dogs'
)
There could be numerous answer conditions.
Basically the question I am asking is what is the best way to write a query where you have conditions on multiple rows from the same table with the same columns...
Sorry if this is confusing, feel free to ask any questions I will try and answer them as best as I can...
Thanks.
Upvotes: 3
Views: 6404
Reputation: 40506
There's also the option of using the PIVOT operator.
That's how the query you proposed above could be written:
select UserId, UserName
from (
select
u.UserId,
u.UserName,
case
when a.QuestionId = 1 then 'Color'
when a.QuestionId = 2 then 'Animal'
when a.QuestionId = 3 then 'Temperature'
when a.QuestionId = 4 then 'Season'
when a.QuestionId = 5 then 'Age'
end as Question,
a.Answer
from Users u
join Answers a on a.UserId = u.UserId
) as SourceTable
pivot (
max(Answer)
for Question in (
[Color],
[Animal],
[Temperature],
[Season],
[Age])
) as pivotTable
where Animal = 'Dogs'
and Color in ('red', 'purple')
and Age < 50
Here's a link to test this online: http://www.sqlfiddle.com/#!3/5c960/23
Yes, the query looks cumbersome, but you could write it once and (assuming the questions don't change that often) just alter the where
clause, which is pretty easy to write/read/understand/maintain (see the last three lines in the code block above).
Update:
For performance analysis, compare these two queries:
(after executing the queries, click the "View Execution Plan" link above the results to see what SQL does under the covers)
What I recommend though is that you run these queries on a database of your own, where you may have already created appropriate indexes and where the volume of data could be enough to produce a relevant result.
I'm no SQL performance expert, but I have a hunch that Ann L.'s solution could be more efficient and scale better to large volumes of data. But that, again, it's just a hunch; if you can perform a test on your environment you could see the actual results.
Upvotes: 3
Reputation: 13965
Your basic query looks just fine. As you get more elaborate, you would construct your WHERE clauses slightly differently depending on how your conditions need to be combined.
For example, in the example you provided, of either red or purple being acceptable answers, you could construct the WHERE clause like this:
WHERE (a1.QuestionId = 1 AND (a1.Answer IN ('Red','Purple'))
AND (a2.QuestionId = 2 AND a2.Answer = 'Dogs')
It gets more complicated if only certain sets of answers are acceptable, so if 'Red' and 'Dogs' or 'Purple' and 'Cats' are acceptable, it would look more like this:
WHERE
(
(a1.QuestionId = 1 AND a1.Answer = 'Red')
AND (a2.QuestionId = 2 AND a2.Answer = 'Dogs')
)
OR
(
(a1.QuestionId = 1 AND a1.Answer = 'Purple')
AND (a2.QuestionId = 2 AND a2.Answer = 'Cats')
)
If your conditions get more complicated, you might want to read Dynamic Search Conditions in T-SQL. While your conditions aren't dynamic, there's a lot of useful information there.
Finally, since it's easy to get confused with what question ID goes with what answers, particularly if they aren't nice, human-recognizable values, it can help to use CTEs to pre-select the answers:
WITH Colors
AS (
SELECT *
FROM Answers
WHERE QuestionID = 1
)
, Animals
AS (
SELECT *
FROM Answers
WHERE QuestionID = 2
)
SELECT *
FROM Users
JOIN Colors
ON Users.UserID = Colors.UserID
JOIN Animals
ON Users.UserID = Animals.UserID
WHERE (
Colors.Answer = 'Red'
AND Animals.Answer = 'Dogs'
)
OR (
Colors.Answer = 'Purple'
AND Animals.Answer = 'Cats'
)
Upvotes: 3