im_tsm
im_tsm

Reputation: 2051

SQL Server: Join two tables and only get data of one table without identical rows

I have two tables, not the actual ones, I am trying to replicate the situation here:

tblMstPost with a large number of columns(say 20). I am adding just two here.

PostId      Title
1           First Post
2           Second Post
3           Third post

tblTrnComment

CommentId      PostId       Comment
1              1            Hello
2              1            Hi
3              1            Hey
4              2            Test
5              3            Hello

Now I want data from post table only. I don't need any data from Comment table. The condition to get data from post table is that I need posts which have the comment "Hello" and "Hi".

Now I can write something like this:

SELECT p.*
FROM  tblMstPost AS p 
INNER JOIN tblTrnComment AS c 
    ON p.PostId = c.PostId
WHERE c.CommentId IN (1, 2, 5)

Above query will give results with two identical rows with PostId 1.

PostId      Title
    1       First Post
    1       First Post
    3       Third post

Now I want to remove one of the identical rows. I have tried DISTINCT but one of my columns on Post table has text data type and for this reason, DISTINCT is not working. When I GROUP BY p.PostId SQL server asks the same for all of the columns: [column] is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. Please note that post table has a large number of columns and I don't want to add all of them to GROUP BY statement.

Is there any solution other than using WHERE IN subquery?

Update: I have found this video on Youtube which clearly explains exists and my question: https://youtu.be/zfgJ3ZmAgNw

Upvotes: 0

Views: 75

Answers (4)

Diado
Diado

Reputation: 2257

To avoid a WHERE IN clause, you could join to a distinct set of Post Ids which have the appropriate comment:

SELECT p.*
FROM   tblMstPost    p
JOIN   (    SELECT DISTINCT PostId
            FROM   tblTrnComment
            WHERE  Comment = 'Hello'
            OR     Comment = 'Hi'     -- Or have an IN here, or a lookup etc
       )    t
ON     p.PostId    = t.PostId

Upvotes: 0

Yogesh Sharma
Yogesh Sharma

Reputation: 50163

JOIN might produce duplicate rows as because of tblTrnComment table has many rows associate with PostId from tblMstPost table. So, what you need is IN or EXISTS

So, you don't need to use JOIN

select p.*
from tblMstPost p
where p.PostId in (select PostId 
                   from tblTrnComment 
                   where Comment in ('Hi', 'Hello')
                   );

Now, i would suggest EXISTS instead

select p.*
from tblMstPost p
where exists(select 1 
             from tblTrnComment c
             where c.PostId = p.PostId and 
                   c.Comment in ('Hi', 'Hello')
             );

Upvotes: 1

Michał Turczyn
Michał Turczyn

Reputation: 37337

Try

select * from tblMstPost p
where exists(select 1 from tblTrnComment
             where PostId = p.PostId and Comment in ('Hi', 'Hello'));

Upvotes: 2

MarkD
MarkD

Reputation: 1063

For grouping you need to group all of the rows, as you put group by id the compiler didnt know what to do with the titles, does it group the same, should it aggregate them all. so you list all non aggregated columns in your group by like below.

SELECT p.*
    FROM  tblMstPost AS p 
    INNER JOIN tblTrnComment AS c 
        ON p.PostId = c.PostId
    GROUP BY PostId, Title
    WHERE c.CommentId IN (1, 2, 5)

Upvotes: 0

Related Questions