Reputation: 47

Joining multiple tables returning duplicates

I am trying the following select statement including columns from 4 tables. But the results return each row 4 times, im sure this is because i have multiple left joins but i have tried other joins and cannot get the desired result.

    select table1.empid,table2.name,table2.datefrom, table2.UserDefNumber1, table3.UserDefNumber1, table4.UserDefChar6
from table1
inner join table2
on table2.empid=table1.empid
inner join table3
on table3.empid=table1.empid
inner join table4
on table4.empid=table1.empid

where MONTH(table2.datefrom) = Month (Getdate())

I need this to return the data without any duplicates so only 1 row for each entry.

I would also like the "where Month" clause at the end look at the previous month not the current month but struggling with that also.

I am a bit new to this so i hope it makes sense.

Thanks

Upvotes: 2

Answers (3)

flo

Reputation: 10291

If the duplicate rows are identical on each column you can use the DISTINCT keyword to eliminate those duplicates.

But I think you should reconsider your JOIN or WHERE clause, because there has to be a reason for those duplicates:

The WHERE clause hits several rows in table2 having the same month on a single empid
There are several rows with the same empid in one of the other tables
both of the above is true

You may want to rule those duplicate rows out by conditions in WHERE/JOIN instead of the DISTINCT keyword as there may be unexpected behaviour when some data is changing in a single row of the original resultset. Then you start having duplicate empids again.

You can check if a date is in the previous month by following clause:

date BETWEEN dateadd(mm, -1, datefromparts(year(getdate()), month(getdate()), 1)) 
AND datefromparts(year(getdate()), month(getdate()), 1)

This statment uses DATEFROMPARTS to create the beginning of the current month twice, subtract a month from the first one by using DATEADD (results in the beginning of the previous month) and checks if date is between those dates using BETWEEN.

Upvotes: 1

Gordon Linoff

Reputation: 1271003

If your query is returning duplicates, then one or more of the tables have duplicate empid values. This is a data problem. You can find them with queries like this:

select empid, count(*)
from table1
group by empid
having count(*) > 1;

You should really fix the data and query so it returns what you want. You can do a bandage solution with select distinct, but I would not usually recommend that. Something is causing the duplicates, and if you do not understand why, then the query may not be returning the results you expect.

As for your where clause. Given your logic, the proper way to express this would include the year:

where year(table2.datefrom) = year(getdate()) and
      month(table2.datefrom) = month(Getdate())

Although there are other ways to express this logic that are more compatible with indexes, you can continue down this course with:

where year(table2.datefrom) * 12 + month(table2.datefrom) = year(getdate()) * 12 + Month(Getdate()) - 1

That is, convert the months to a number of months since time zero and then use month arithmetic.

If you care about indexes, then your current where clause would look like:

where table2.datefrom >= dateadd(day,
                                 - (day(getdate) - 1),
                                 cast(getdate() as date) and
      table2.datefrom < dateadd(day,
                                 - (dateadd(month, 1, getdate()) - 1),
                                 cast(dateadd(month, 1, getdate()) as date)

Upvotes: 1

paul

Reputation: 22021

Eliminate duplicates from your query by including the distinct keyword immediately after select

Comparing against a previous month is slightly more complicated. It depends what you mean:

If the report was run on the 23rd Jan 2015, would you want 01/12/2014-31/12/2014 or 23/12/2014-22/01/2015?

Upvotes: 0

Joining multiple tables returning duplicates

Answers (3)

Related Questions